Author: Olaf Kopp
Reading time: 6 Minutes

Generating a related set of documents for an initial set of documents

Topics:

Rate this post

This patent describes methods, systems, and computer programs for identifying documents related to an initial set of documents. It focuses on generating a set of related documents based on user interactions with an initial set of documents. The core idea revolves around using strength of relationship scores between documents to determine their relevance to each other, based on user selection data indicating whether a user viewed a candidate document after being presented with an initial document in search results.

Patent ID: US8447760B1
Countries Published: United States
Date of Patent: May 21, 2013
Inventors: Simon Tong (Mountain View, CA), Benjamin N. Lee (Sunnyvale, CA), Eric E. Altendorf (San Francisco, CA)
Assignee: Google

Background

The background highlights the challenge of providing relevant information in response to user queries on internet search engines. It notes that users are often presented with a vast array of search results and documents over time, navigating through these to find the information they seek. The document outlines the need for a method to generate a related group of search result documents from an initial set of documents, leveraging user interaction data to improve the relevance and utility of search results presented to the user. This background sets the stage for the patent’s solution, which aims to enhance the search experience by identifying documents that are contextually related based on user behaviors, such as the documents they view after being presented with initial search results.

Claims

The claims of US 8,447,760 B1 focus on a detailed methodology for identifying documents related to an initial set of documents using user selection data. Here’s a summary of the key aspects covered in the claims:

  • Identifying Related Documents: Methods, systems, and computer programs are described for determining related documents by calculating strength of relationship scores between candidate documents and an initial set of documents. These scores are based on user interactions, specifically whether a user viewed a candidate document after being presented with an initial document in search results. The patent outlines examples where the system suggests related documents or queries to users based on their search history or the documents they’ve viewed. For instance, if a user frequently views documents about family-friendly activities in San Francisco, the system might suggest related searches or documents, such as guides to Alcatraz or affordable hotels in the area. These examples illustrate how the system can proactively assist users in discovering content that aligns with their interests and search behaviors.

  • Use of User Selection Data: The process involves aggregating user selection data, which includes whether the user viewed the candidate document within a specified time window after the initial document was presented in response to a search query.
  • Calculation of Aggregate Scores: An aggregate strength of relationship score is calculated for each candidate document from its individual scores. The related documents are then selected based on these aggregate scores.
    • Individual Scores: For each candidate document, an individual strength of relationship score is determined with respect to each document in the initial set of documents. These scores are derived from user selection data, reflecting whether users viewed the candidate document within a certain time frame after an initial document was presented in response to a search query.
    • Aggregation of Scores: The patent describes aggregating these individual scores to calculate an aggregate strength of relationship score for each candidate document. This aggregation considers the individual relationship scores of a candidate document with all documents in the initial set.
    • User Selection Data: The core of the scoring system relies on user selection data, which indicates user interactions with documents following an initial search result. This data may include whether the user viewed the candidate document during a predefined window of time after the initial document was presented.
    • Scoring Factors: The system may scale user selection data by scoring factors under certain conditions, for example, if a user views the candidate document during the specified time window after selecting (clicking on) the initial document from the search results page.
    • Normalization: Scores can be normalized based on the popularity of candidate documents within the user selection data, ensuring that the scores reflect genuine user interest and relevance rather than just frequency of appearance in search results.
    • Selection of Related Documents: The aggregate scores are used to select related documents from the pool of candidate documents. Documents with higher aggregate scores are deemed more relevant to the initial set of documents, based on user behavior patterns.
    • Applications: These scores can be employed in several key applications, including:
      • Enhancing search results by including related documents.
      • Improving the ranking of search results within a session by considering documents’ relatedness.
      • Suggesting related documents or queries to users based on their interaction history.

  • Application of the Methodology: The methodology can be applied in various ways, including augmenting search results with related documents, session-based ranking improvements, and providing suggestions for documents or queries based on user history or interactions.
    • Augmenting Search Results: Engagement data can help identify documents that, based on user behaviors, may be relevant to include alongside initial search results, potentially improving the user’s search experience by preempting their needs. The patent describes a scenario where a user searches for “San Francisco Vacation,” receives and interacts with various search results, and based on these interactions, the system identifies and presents documents related to those initially viewed. This scenario exemplifies how the system can augment search results with related documents, enriching the user’s discovery process and providing them with more comprehensive information relevant to their search query.
    • Session-Based Ranking: In a session where a user conducts multiple searches or views multiple documents, engagement data from the entire session can inform the ranking of search results, prioritizing documents with higher relevance based on past user interactions within the session. Another example provided involves using the methodology to improve the ranking of search results within a session. By tracking documents a user interacts with during a session, the system can adjust the ranking of subsequent search results to prioritize documents related to those previously viewed. This example demonstrates the system’s ability to dynamically tailor search results based on user behavior within a session, enhancing relevance and user satisfaction.
    • Suggesting Content: Beyond improving search results, engagement data can be used to suggest related documents or queries to users, guiding them towards content they might find interesting based on their previous interactions.
  • Adjustment for Language and Locale: The claims also mention the adaptability of the system to cater to specific languages, geographic locations, or user preferences by adjusting the scores or using separate models for different demographics.
  • Consideration of User Engagement Data: Additional aspects of the claims consider using detailed user engagement data, such as the duration for which a document was viewed and the sequence of views, to refine the selection of related documents. The patent mentions the potential to use detailed engagement data, such as the duration of viewing documents, to refine the selection of related documents. For example, if a user spends a significant amount of time viewing a document about budget dining options in San Francisco, the system could use this information to suggest other related budget dining guides or articles, assuming the longer view duration indicates a higher interest level.
    • User Selection Data: Engagement data is essentially user selection data, which records whether a user viewed a candidate document during a specific window of time after an initial document was presented in search results in response to a query. This includes tracking clicks, views, and the duration for which each document was viewed.
    • Duration of View: The patent mentions the potential use of detailed engagement data, such as the duration for which a document was viewed, to enhance the accuracy of related document selection. This implies that not just the act of viewing but the length of engagement with a document could influence its perceived relevance and the strength of its relationship to the initial document.
    • Sequence of Views: The order in which documents are viewed following an initial search result can also provide valuable insights into user interest patterns, further refining the selection of related documents.

COMMENT ARTICLE



Content from the blog

LLMO: How do you optimize for the answers of generative AI systems?

As more and more people prefer to ask ChatGPT rather than Google when searching for read more

What is the Google Knowledge Vault? How it works?

The Google Knowledge Vault was a project by Google that aimed to create an extensive read more

What is BM25?

BM25 is a popular ranking function used in information retrieval systems to estimate the relevance read more

The dimensions of the Google ranking

The ranking factors at Google have become more and more multidimensional and diverse over the read more

Interesting Google patents for search and SEO in 2024

In this article I would like to contribute to archiving well-founded knowledge from Google patents read more

What is the Google Shopping Graph and how does it work?

The Google Shopping Graph is an advanced, dynamic data structure developed by Google to enhance read more