Author: Olaf Kopp
Reading time: 6 Minutes

Ranking search results based on similar queries

Topics: , ,

Rate this post

The patent describes a method and system for improving the ranking of search results based on the analysis of similar queries made previously. It addresses the challenge of refining search results to make them more relevant, especially when there’s insufficient user behavior data associated with the current query. By leveraging data from similar past queries, the system can offer more accurately ranked search results.

  • Patent ID: US9009146B1
  • Countries Published: United States
  • Date of Patent: April 14, 2015
  • Expiration Date: Not explicitly stated, but typically 20 years from the filing date, which would be around May 21, 2032, considering the application was filed on May 21, 2012.
  • Inventors: Andrei Lopatenko, Hyung-Jin Kim, Sandor Dornbush, Leonard Wei, Timothy P. Kilbourn, Mikhail Lopyrev

Background

The background addresses the challenge of effectively ranking search results in internet search engines. Search engines typically index web pages by crawling the World Wide Web and analyzing page contents, such as titles, headings, and meta tags, to determine how these pages should be indexed for future queries. When a user submits a query, the search engine sifts through its index to provide a list of the most relevant web pages.

However, the relevance of these pages is determined by various criteria, including the popularity or authority of the pages, and the engines employ different techniques to rank the results with the “best” or most relevant results first. The challenge lies in improving the accuracy and relevance of these search results, ensuring that the most useful pages are easily accessible to the user. This background sets the stage for introducing the patent’s solution to enhancing search result rankings by analyzing similar queries and user behavior data associated with those queries.

Claims

Key aspects covered in the claims include:

  1. Scoring Similar Queries: The method involves scoring other queries based on their similarity to a user-submitted query. Each query different from the user-submitted query is scored for similarity, considering factors like term similarity, including variants, synonyms, and diacritical variants.
  2. Selecting Queries and Deriving Statistics: It selects one or more of these scored queries based on their similarity scores. For documents identified in response to a user’s query, it derives quality of result statistics from the user behavior data associated with the selected similar queries.
  3. Quality of Result Statistics: These statistics are based on user interactions with the documents, such as click data, and are used to adjust the rankings of the search results for the user-submitted query, aiming to enhance the relevance and usefulness of the results presented to the user.

  1. Adjusting Search Results: The method includes adjusting the derived statistics for a document based on comparisons with other documents’ statistics and considering the similarity scores of the queries they are associated with. This adjustment can further refine the relevance of the search results.
  2. User Behavior Data and Term Types: The claims detail the use of different types of user behavior data, like click data, and categorize terms within queries into types such as regular terms, variant terms, optional terms, and stopword terms, each playing a specific role in scoring query similarity and adjusting result rankings.
  1. Regular Terms:
    • These are terms that, given their context within a query, are deemed essential or necessary for the query’s meaning.
    • Regular terms are critical for the query’s specificity and relevance.
    • They receive a high weight in the similarity scoring process, reflecting their importance in matching the user-submitted query with similar historical queries.
  2. Variant Terms:
    • Variant terms are also considered important for the query’s context but have the flexibility of being matched with variations.
    • Variations may include stems (e.g., “running” matched with “run”), synonyms (e.g., “automobile” with “car”), or diacritical variants (e.g., “résumé” with “resume”).
    • The weight assigned to variant terms in the similarity scoring may vary depending on the degree of similarity between the variant term and its match in the historical query. This flexibility allows for a broader match while still maintaining the query’s core intent.
  3. Optional Terms:
    • These terms, while present in the query, are considered less critical to the query’s overall meaning and specificity.
    • Optional terms may not necessarily alter the fundamental search intent if omitted but can provide additional context or clarification.
    • They are weighted less than regular and variant terms in the similarity scoring process, reflecting their secondary role in defining the query’s meaning.
  4. Stopword Terms:
    • Stopword terms include common words that are usually filtered out before or after processing a query due to their ubiquity and lack of specificity (e.g., articles, prepositions, conjunctions).
    • These terms typically include words like “the,” “and,” “of,” which are often necessary for grammatical completeness but contribute minimally to the search intent.
    • Stopword terms receive minimal to no weight in the similarity scoring process, acknowledging their limited value in distinguishing between different queries’ meanings.

By categorizing terms in these ways, the patent describes a nuanced approach to analyzing and scoring the similarity between queries. This categorization allows for a more refined adjustment of search result rankings based on the relevance inferred from similar historical queries and associated user behavior data.

The patent provides an example of a user submitting the query “official jack in the box.” The search engine produces initial results based on document content. The methodology then involves identifying similar historical queries (e.g., “jack in the box,” “official box of jacks”) and their respective user behavior data to refine the search results. This example demonstrates how the system refines the rankings of search results by leveraging user behavior data associated with similar queries when there’s insufficient data for the newly submitted query.

The document explains the process of borrowing user behavior data from historical queries similar to the user-submitted query. For instance, if “official jack in the box” lacks sufficient user behavior data, but “jack in the box” has ample data, the system uses the latter’s data to adjust the ranking of search results for the former. This practical application illustrates how leveraging data from similar queries can enhance the relevance of search results, even for queries with limited direct user behavior data.

Implications for SEO

  • Understanding User Intent: The emphasis on scoring queries based on similarity and leveraging user behavior data underscores the importance of understanding user intent behind search queries. SEO strategies should focus on creating content that aligns with the intent of targeted queries, considering variations and synonyms that users might employ.
  • Keyword Optimization Beyond Exact Matches: The categorization of terms (regular, variant, optional, and stopword) highlights the need for a nuanced approach to keyword optimization. SEO efforts should extend beyond exact-match keywords to include synonyms, stemming variations, and relevant phrase matches to cover a broader range of similar queries that could influence rankings.
  • Content Relevance and User Engagement: Given the role of user behavior data (like click-through rates and dwell time) in refining search results, it’s crucial for SEO strategies to not only attract clicks but also engage users effectively to encourage longer dwell times. This involves creating high-quality, relevant content that fulfills users’ search intents and encourages positive user behaviors.
  • Strategic Use of Optional and Stopword Terms: While the patent indicates that optional and stopword terms have less impact on the similarity scoring, their strategic use in content can still contribute to natural, user-friendly language that improves readability and user experience, potentially influencing engagement metrics.
  • Localization and Language Considerations:The  mention of diacritical variants and the potential for language-specific processing suggests that SEO strategies should consider localization and language nuances, especially for multilingual websites. This involves optimizing content with appropriate language variations and considering local search behaviors.
  • Data-Driven SEO: The methodology highlights the importance of leveraging data analytics to understand how similar queries and user behaviors influence search rankings. SEO professionals should use search analytics tools to analyze search trends, user interactions, and content performance, adjusting their strategies based on insights derived from data.
  • Adapting to Algorithm Changes: The patent reflects search engines’ evolving algorithms focused on user intent and behavior. SEO professionals need to stay informed about algorithm updates and adjust their strategies to align with how search engines are using user behavior and query analysis to rank conten

COMMENT ARTICLE



Content from the blog

What is the Google Knowledge Vault? How it works?

The Google Knowledge Vault was a project by Google that aimed to create an extensive read more

What is BM25?

BM25 is a popular ranking function used in information retrieval systems to estimate the relevance read more

The dimensions of the Google ranking

The ranking factors at Google have become more and more multidimensional and diverse over the read more

Interesting Google patents for search and SEO in 2024

In this article I would like to contribute to archiving well-founded knowledge from Google patents read more

What is the Google Shopping Graph and how does it work?

The Google Shopping Graph is an advanced, dynamic data structure developed by Google to enhance read more

“Google doesn’t like AI content!” Myth or truth?

Since the AI revolution, fueled by the development of large language models (LLMs) and generative read more