Author: Olaf Kopp , 12.February 2024
Reading time: 55 Minutes

Most interesting Google Patents for semantic search

4/5 - (8 votes)

Since 2013 Google is enabling an entity based semantic search system parallel to the classical term-based search. Entities play a major role in search query processing, content relevance determination, structuring and organization of knowledge, SERP presentation and E-E-A-T. Every second search related Google patent is about entities. In this article I would like to contribute to archiving well-founded knowledge from Google patents related to semantic and entity based search.

More about Google patents in my followings arcticles:

Since 2013 I have been working intensively on semantic SEO and entity-based searches. You can find more well-researched and well-founded information about semantic and entity-based searches from me in other articles here on the blog or at Search Engine Land.

Enjoy!

Disclaimer: Are the systems and methods in the patents used by Google?

A patent application does not mean that the methods described there will find its way into practice in Google search. An indication of whether a methodology/technology is so interesting for Google that it could find its way into practice can be obtained by checking whether the patent is pending only in the US or other countries. The claim for a patent priority for other countries must be made 12 months after the first filing.

Regardless of whether a patent finds its way into practice, it makes sense to deal with Google patents, as you get an indication of the topics and challenges that product developers at Google are dealing with.

Media consumption history

  • Published for: United States, China, WIPO
  • Last Publication Date: February 2, 2024
  • Status: Pending
  • Inventors: Matthew Sharifi

The patent introduces a sophisticated system designed to enhance user experiences by personalizing content recommendations and search responses based on users’ historical content consumption. This system leverages a detailed analysis of media consumption histories, utilizing various components to process user queries and deliver highly relevant information.

System Components and Workflow

  • Content Consumption Engine: Stores detailed records of content consumed by users, including associated entities and contextual details, in a media consumption history database.
  • Classifier Engine: Analyzes consumed content to classify content types (e.g., movies, songs) and gathers additional relevant information, such as associated entities and production details.
  • Query Engine: Acts as the interface for user queries, capable of processing textual and voice inputs, and collaborates with the query analysis engine to tailor responses based on the user’s content consumption history.
  • Query Analysis Engine: Identifies relevant content items and entities based on user queries by analyzing query terms and accessing the media consumption history database. It assigns relevance scores to these items and entities to personalize the query responses.

Ranking and Scoring

Content items and entities are ranked based on confidence and relevance scores, allowing the system to prioritize information in the query response.

Relevance Scores
  • Definition: Relevance scores measure how much a user is likely interested in a specific content item or entity. These scores reflect the user’s perceived preference for various content items or entities, based on their consumption history.
  • Determination: Factors influencing relevance scores include the method of content identification (e.g., social network endorsements vs. cable TV history) and the frequency or context of content consumption (e.g., re-watching content, location of consumption).
Confidence Scores and Ranks
  • Confidence Scores: These scores indicate the likelihood that the content has been correctly identified as consumed by the user. Different identification methods can result in varying confidence levels.
  • Ranking: Content items and entities are ranked based on their confidence and relevance scores. This ranking helps in prioritizing content that is more relevant or of interest to the user.

Innovative Aspects:

  • The system offers personalized responses to search queries, including knowledge cards with relevant information, based on the user’s media consumption history.
  • It considers which media content the user has consumed and links this to the search query.

Additional Features:

  • Responses to a search query may include data indicating that an entity is recorded in the user’s media consumption database as consumed content.
  • The system can also specify the time and location of consumption, incorporating these details into the response.
  • It can position knowledge cards based on consumption behavior and determine the number of knowledge elements to include.
Knowledge Cards:
  • A knowledge card is a user interface element displaying information or known facts about a specific entity mentioned in the search query.
  • The system determines the content of the knowledge card by identifying content consumed by the user that is associated with the queried entity.

For instance, a query for “The Rolling Stones” prompts the system to identify content related to the band that the user has previously consumed (e.g., albums or songs) and uses this information to populate the knowledge card with relevant content.

Implications for SEO

1. Enhanced User Personalization

The system’s ability to tailor search results and content recommendations to individual users’ past consumption behaviors underscores the growing importance of personalization in SEO. Search engines that leverage such technologies can offer more relevant results, improving user satisfaction and engagement. For SEO, this means optimizing content not just for keywords but also for user intent and preferences.

2. Increased Importance of User Engagement Metrics

With the system prioritizing content based on historical consumption, metrics like click-through rates (CTR), time on page, and bounce rates become even more critical. These metrics can influence how content is recommended to users, highlighting the need for SEO strategies that focus on engaging users and encouraging them to interact more deeply with content.

3. Content Diversity and Relevance

The system’s emphasis on matching content to users’ interests and past interactions suggests a need for a diverse content strategy that covers a wide range of topics, formats, and perspectives. SEO strategies should focus on creating comprehensive content that addresses various aspects of a topic, ensuring that there’s something relevant for different user preferences.

4. Semantic Search Optimization

Given the system’s use of entities and relevance scoring, there’s a clear move towards semantic search optimization. SEO strategies should incorporate structured data markup to help search engines understand the context and relationships between entities within content. This includes optimizing for topics and entities that are closely related to the user’s interests and past consumption patterns.

5. Dynamic Content Optimization

The patent highlights the dynamic nature of content relevance, based on changing user preferences and consumption histories. SEO strategies must adapt by continuously updating and optimizing content to remain relevant to the target audience’s evolving interests. This could involve updating older content, adding new insights, or repurposing content across different formats.

6. Cross-Platform Content Strategy

With content consumption data potentially coming from various sources (e.g., TV, online platforms, social media), SEO strategies should consider a cross-platform approach. Optimizing content for visibility and engagement across different platforms can increase the chances of it being recommended to users, regardless of where they consume content.

Natural language processing based search

Patent ID: WO2014127500A1

Countries Published: WIPO, Europe, China

Last Publishing Date: The document was published on August 28, 2014.

Inventors: Guanghua Li, David Francois Huynh, Yanlai Huang, Yuan Gao, Ying Chai, Manish Rai Jain, and Yong Zhang.

The patent addresses the limitations of conventional search query processing techniques, which primarily rely on keyword searching and word matching. These traditional methods often fall short in understanding the intent and context of user queries, leading to less relevant search results.

The patent highlights the need for a more sophisticated approach that can interpret natural language queries in a way that aligns with the underlying data structures, such as knowledge graphs, to improve the accuracy and relevance of search outcomes. This sets the stage for the development of advanced search systems that leverage natural language processing (NLP) to parse and structure search queries more effectively, thereby enhancing the overall search experience for users.

“The present disclosure relates to processing search queries. Conventional techniques for processing search queries include keyword searching and word matching.”

Claims

Parsing Search Queries: The invention includes methods for parsing a search query into one or more search units, where a search unit comprises one or more words. This parsing is the first step in understanding and structuring the query in a way that can be processed more effectively.

“In some implementations, a first search query is parsed into one or more search units, wherein a search unit comprises one or more words.”

Utilizing Knowledge Graphs: After parsing, the system identifies elements within a knowledge graph corresponding to the search units. Knowledge graphs are data structures that contain information about various entities and the relationships between them, enabling a more nuanced search process.

“Elements of a knowledge graph are identified corresponding to each of the one or more search units.”

Generating Phrase and Query Trees: The system generates a phrase tree by assigning search units to nodes or edges of the tree. A query tree is then generated with a topology identical to the phrase tree, but its nodes and edges are defined based on the phrase tree and the knowledge graph. This structured approach allows for a more sophisticated search mechanism that can interpret the intent behind queries.

“A phrase tree is generated by assigning each of the one or more search units to a node or edge of the phrase tree. A query tree is generated with an identical topology to the phrase tree.”

Retrieving and Refining Search Results: Search results are retrieved based on the query tree, and the system can receive filter queries for refining these results. The invention outlines methods for generating a second query tree based on the original search query and the filter query, allowing for dynamic refinement of search results.

“A search result is retrieved from a knowledge graph based at least in part on the first query tree. A filter query is received, wherein the filter query relates to a refinement of the first search result.”

System Implementation: The claims also cover systems comprising one or more processors that are capable of executing the described methods. These systems can parse search queries, generate phrase and query trees, and retrieve search results from a knowledge graph, among other functionalities.

“In some implementations, a system comprising one or more processors is provided.”

Computer-Readable Media: Additionally, the patent includes claims for non-transitory computer-readable media that have program instructions recorded on them for performing the outlined methods. This ensures that the invention can be implemented in software that can be distributed and run on various hardware platforms.

“In some implementations, a non-transitory computer-readable media having computer program instructions recorded thereon is provided.”

Implications for SEO

  1. Focus on Natural Language and User Intent: With search engines leveraging NLP to understand and process search queries more effectively, SEO strategies should prioritize content that aligns with natural language use and the intent behind search queries. This means moving beyond keyword stuffing and focusing on creating content that answers questions, solves problems, and matches the conversational tone users might employ in their searches.
  2. Structured Data and Schema Markup: The patent’s emphasis on using structured data, such as knowledge graphs, highlights the importance of structured data and schema markup on websites. By clearly defining your site’s content in terms of entities and their relationships, you can make it easier for search engines to understand and index your content, potentially leading to better visibility in search results.
  3. Content Depth and Quality: The ability of search systems to parse queries into search units and match them with knowledge graphs underscores the need for in-depth, quality content that covers topics comprehensively. Websites should aim to be authoritative sources that can fulfill multiple related queries, thereby increasing their chances of being matched with a wider array of search intents.
  4. Semantic Search Optimization: As search engines become better at understanding the semantics behind queries, SEO strategies should also evolve to focus on semantic search optimization. This involves optimizing content for topics, not just keywords, and ensuring that content contextually aligns with user queries.
  5. User Experience and Engagement: The patent suggests that search engines are increasingly capable of understanding and refining search results based on user interactions and feedback. This means that SEO isn’t just about getting users to your site; it’s also about providing an excellent user experience that meets their needs, encourages engagement, and satisfies their search intent.
  6. Voice Search and Conversational Queries: With the rise of voice search and digital assistants, the importance of optimizing for conversational queries becomes more pronounced. Content should be optimized to answer questions directly and conversationally, as more users turn to voice search for quick, spoken queries.
  7. Adaptation to Search Engine Updates: As search engines implement technologies described in patents like this, SEO professionals must stay informed about updates and changes to search algorithms. Adapting strategies in response to these changes is crucial for maintaining or improving search rankings.

Resource scoring adjustment based on entity selections

Patent ID: US10303684B1
Countries Published For: United States
Last Publishing Date: May 28, 2019
Expiration Date: May 6, 2037
Inventors: Kenichi Kurihara

The Patent addresses the challenges and mechanisms involved in digital information retrieval, particularly in the context of search engines.

The ranking process involves scoring resources using factors such as information retrieval scores, which measure the relevance of a query to the content of a resource, and authority scores, which assess the importance of a resource relative to others.

Moreover, the background highlights the use of additional factors, including user feedback, to adjust resource scores. Resources that frequently satisfy users’ informational needs for specific queries are selected more often, indicating their relevance and utility. This user selection data allows search engines to adjust search scores, giving a “boost” to resources that perform well in satisfying users’ needs. However, the document also notes the challenge of scoring resources with insufficient search and selection data, such as newly published resources, which may not have a history of user interactions to inform their relevance and ranking.

Claims

Accessing Resource Data: The system accesses data specifying a plurality of resources. For each resource, this data includes a unique identifier and information on one or more entities referenced within the resource.

Accessing resource data involves the search engine’s system collecting and using information about various online resources, such as websites, articles, or videos. Each resource is identified by a unique code or identifier, making it distinguishable from others. Additionally, the system identifies and records the specific topics, concepts, or entities (like places, people, or things) that each resource mentions or discusses.

In simpler terms, imagine the search engine going through a library of digital content and taking notes on what each piece of content is about and what specific subjects it covers. This process helps the search engine understand the content and context of each resource, preparing it for more detailed analysis, such as figuring out how relevant a resource is to certain search queries based on the entities it references.

Accessing Search Term Data: It also accesses data specifying a set of search terms. For each search term, there is a selection value for each resource, which is determined based on user selections of search results that referenced the resource.

Accessing search term data refers to the process where the search engine collects and analyzes information about the words or phrases (search terms) that people use when they look for information online. For each search term, the system also gathers data on how users interact with the search results—specifically, which results they choose to click on or select. This interaction is measured through selection values assigned to each resource, indicating its relevance or attractiveness to users based on the search term used.

To put it simply, imagine the search engine keeping track of what people are searching for and noticing which websites or pages they end up visiting from the list of results it provides. This helps the search engine understand which resources are more useful or relevant to users for specific search terms, guiding it in improving how it ranks and presents search results in the future.

Determining Search Term-Entity Selection Values: From the resource and search term data, the system calculates a search term-entity selection value for each search term and each entity. This value is based on the selection values of resources that reference the entity and were included in search results for queries containing the search term.

Determining search term-entity selection values involves a sophisticated process where the search engine calculates specific values that represent how often users select resources based on the entities those resources reference in relation to specific search terms. This process is a key part of the patent and involves several steps:

  1. Combining Resource and Search Term Data: The system first looks at the information it has gathered about resources (such as web pages or articles) and the entities (topics, concepts, or things) they mention. It also considers the search terms people use and how they interact with the search results related to these terms.
  2. Analyzing User Selections: For each combination of a search term and an entity, the system analyzes how frequently resources mentioning that entity are selected when they appear in search results for that search term. This involves looking at the selection values for resources, which are based on user clicks or interactions.
  3. Calculating Selection Values: Based on this analysis, the system calculates a “search term-entity selection value.” This value reflects the likelihood of resources referencing a particular entity being selected in response to a specific search term. It’s a measure of the relevance and appeal of resources related to certain entities for specific search queries.

In simpler terms, this process is like figuring out how popular certain topics are among people searching for specific things. For example, if many people search for “healthy recipes” and often choose articles that mention “quinoa,” the search term-entity selection value for “quinoa” in relation to “healthy recipes” would be high. This indicates that quinoa is a relevant and appealing topic for people interested in healthy recipes.

This calculated value helps the search engine understand which topics or entities are most relevant to users’ interests based on their search behavior. It can then use this information to adjust how it ranks and presents search results, aiming to show users the most relevant and useful information first.

Storing Search Term-Entity Selection Values: Finally, the system stores these calculated search term-entity selection values in a data storage.

  1. Data Storage: The search engine uses a data storage system, which could be databases or other forms of digital storage, to keep the calculated search term-entity selection values. This storage allows the system to quickly access these values when needed, without having to recalculate them each time.
  2. Organized and Efficient Access: The values are stored in an organized manner, ensuring that the search engine can efficiently retrieve them when processing search queries. This organization might involve indexing the values based on search terms, entities, or other relevant criteria to speed up access.
  3. Use in Ranking Process: When a new search query is entered, the search engine can pull the relevant search term-entity selection values from storage to help determine the ranking of resources in the search results. This means that if a particular entity is highly relevant to a search term based on past user selections, resources referencing that entity can be ranked higher.

Implications for SEO

1. Entity-Based Optimization:

SEO strategies need to evolve beyond traditional keyword optimization to include entity-based optimization. This means creating content that not only targets specific keywords but also thoroughly covers related entities (people, places, things, concepts) that users might associate with these keywords. Understanding and incorporating relevant entities into content can increase its visibility and ranking in search results.

2. User Intent and Behavior:

The patent highlights the importance of aligning content with user intent and behavior. SEO practitioners must analyze how users interact with content related to specific search terms and entities. This involves understanding which types of content users prefer and how they select resources in search results. Optimizing content to meet user expectations and satisfy their informational needs can lead to better engagement and higher selection values, potentially boosting search rankings.

3. Quality and Relevance of Content:

The mechanism described in the patent suggests that Google could use selection values as a signal of content quality and relevance. Therefore, producing high-quality, relevant content that addresses the needs and interests of the target audience is crucial. Content that effectively engages users and matches their search intent is more likely to be selected, positively influencing its search term-entity selection values and, by extension, its search rankings.

4. Data-Driven SEO Strategies:

SEO strategies should become more data-driven, with an emphasis on analyzing user behavior data to inform content creation and optimization. This includes studying which entities are frequently associated with high selection values for specific search terms and understanding the context in which these entities are discussed. Leveraging analytics tools to gather insights into user preferences and content performance will be key.

5. Long-Tail and Semantic Search Optimization:

Given the focus on entities and their relation to search terms, optimizing for long-tail keywords and semantic search becomes increasingly important. Long-tail keywords, which are often more specific and query-like, can capture the user’s intent more accurately and may include relevant entities directly. Semantic search optimization involves structuring content to answer questions and cover topics comprehensively, reflecting the natural way people search for information.

Ranking Search Results Based on Entity Metrics

The patent ID is US10235423B2. The patent was officially issued on March 19, 2019. The inventors listed are Hongda Shen, David Francois Huynh, Grace Chung, Chen Zhou, Yanlai Huang, and Guanghua Li, all associated with Google LLC, Mountain View, CA, USA.  It is published for US and WIPO. This means that it is more likely to be used in practice.

For me this is the basic patent for the algorithmic implementation for E-E-A-T ratings.

Background:

The patent addresses the challenge of effectively ranking search results in a way that is both relevant and valuable to the user. It recognizes the limitations of existing methods in adequately distinguishing between the nuances of different types of entities and their corresponding metrics in search results.

Core Insights:

  • Purpose: The patent outlines methods, systems, and computer-readable media for ranking search results through determining various metrics based on the search results. It particularly emphasizes the weighting of these metrics based on the type of entity included in the search, suggesting a nuanced approach to search result ranking.
  • Process: A score is determined by combining metrics and weights, where weights are partly based on the entity type in the search query. This score is then used to rank the search results, indicating a dynamic and adaptable ranking mechanism that takes into account both quantitative metrics and qualitative assessments of entity types.
  • Factors: The patent details several key metrics such as related entity metric, notable type metric, contribution metric, prize metric, and domain-specific weights. These metrics collectively contribute to the final scoring and ranking of search results.

Summary:

The document elaborates on a sophisticated method for ranking search results by:

  • Determining several metrics based on the search results.
  • Assigning weights to these metrics, where the weights are influenced by the type of entity featured in the search.
  • Combining these metrics and weights to derive a score.
  • Ranking the search results based on this score.

Claims:

The claims of the patent are focused on the specific processes for determining the various metrics (related entity metric, notable type metric, contribution metric, prize metric), the method of calculating domain-specific weights, and the overall scoring mechanism that underpins the ranking of search results.

The entity metrics

The claims of the patent are focused on the specific processes for determining the various metrics (related entity metric, notable type metric, contribution metric, prize metric), the method of calculating domain-specific weights, and the overall scoring mechanism that underpins the ranking of search results.

  1. Related Entity Metric: This metric is determined based on the co-occurrence of an entity reference contained in a search query with the entity type of the entity reference on web pages. For example, if the search query contains the entity reference “Empire State Building,” which is determined to be of the entity type “Skyscraper,” the co-occurrence of the text “Empire State Building” and “Skyscraper” in webpages may determine the relatedness metric.
  2. Notable Type Metric: This metric is a global popularity metric divided by a notable entity type rank. The notable entity type rank indicates the position of an entity type in a notable entity type list, showing the importance or prominence of the entity type in a given context.
  3. Contribution Metric: Based on critical reviews, fame rankings, and other information, the contribution metric is weighted such that the highest values contribute most heavily to the metric. This metric assesses the contribution or significance of the entity or content in its respective domain.
  4. Prize Metric: Reflects recognition or awards associated with the entity, where specific domains like movies may include metrics associated with particular movie awards. The metric values could be determined based on system settings, aggregated user selections of entity references, and data associated with entity references.

These metrics are combined with domain-specific weights to determine a comprehensive score, which is then used to rank the search results. The system’s approach to defining and applying these metrics emphasizes the importance of both quantitative and qualitative analysis of entities and their relationships within the search context.

Implications for SEO

In conclusion, the detailed entity metrics and their application in ranking search results call for a holistic and nuanced approach to SEO. This approach should prioritize entity recognition, content quality and relevance, structured data, and external validations, all tailored to the specific demands of the domain in question.

  1. Entity-Based Search Optimization: SEO strategies must evolve to focus more on entity-based content optimization. This means understanding how search engines recognize and categorize entities within content and optimizing for these entities in addition to traditional keywords.
  2. Content Relevance and Quality: The use of metrics like the related entity metric and notable type metric indicates that search engines are looking at the relevance and authority of content in a much more granular way. For SEO, this means prioritizing high-quality, authoritative content that accurately reflects the entities discussed.
  3. Structured Data and Schema Markup: Implementing structured data and schema markup becomes even more crucial as these tools help search engines understand the entities within a page and how they relate to each other. This can enhance content’s visibility in search results that are increasingly entity-focused.
  4. Diverse and Comprehensive Content: With metrics assessing contributions and prizes (or recognitions), content that covers a wide range of related topics and includes comprehensive discussions of entities (including their achievements and recognitions) may rank higher. This implies that SEO strategies should include creating in-depth content that covers entities from multiple angles.
  5. Social Signals and External Validation: The inclusion of metrics related to prizes and contributions suggests that external validation (such as awards, mentions, reviews, and social media signals) plays a role in content ranking. SEO efforts should thus consider how to garner positive external recognition and citations from reputable sources.
  6. Domain-Specific Optimization: The patent hints at domain-specific weights for metrics, suggesting that what’s important for ranking can vary significantly across different content types or industries. SEO professionals need to understand the specific ranking factors that matter most in their domain and optimize accordingly.
  7. Adapting to Search Engine Evolution: The patent reflects the ongoing evolution of search engines towards understanding and serving user intent through a deeper understanding of content and context. SEO strategies must be flexible and adaptable, focusing on future-proofing content by making it as relevant, authoritative, and user-focused as possible.

Search Result Ranking and Presentation

The patent US11868357B2 focuses on advanced methods and systems for ranking search results and generating their presentation. In summary, the patent outlines a sophisticated approach to search result ranking and presentation, leveraging knowledge graphs and a nuanced understanding of search queries to provide more relevant and effectively presented search results. The patent describes the basic features of a semantic or entity-based search.

General Information

  • Countries Published: United States
  • Last Publishing Date: January 9, 2024
  • Expiration Date: August 8, 2032
  • Inventors: Chen Zhou, Chen Ding, David Francois Huynh, JinYu Lou, Yanlai Huang, Hongda Shen, Guanghua Li, Yiming Li, Yangyang Chai

Core

  1. Enhanced Search Result Ranking: The patent describes methods for improving how search results are ranked. This involves a more sophisticated analysis of search queries.
  2. Use of Knowledge Graphs: A significant aspect is the use of knowledge graphs to determine ranking properties. This approach moves beyond simple keyword matching, incorporating a deeper understanding of the query’s context and the relationships between different entities.
  3. Modifying Concepts in Queries: The patent emphasizes identifying and interpreting modifying concepts (like superlatives) in search queries. These concepts play a crucial role in how search results are ranked and presented.
  4. Presentation Techniques Based on Queries: It proposes methods to generate presentation techniques for search results that are tailored to the specific type of query and the entities it references. This means the way results are displayed could vary significantly based on what the user is searching for.
  5. Comprehensive Search Systems: The summary outlines the development of comprehensive search systems capable of executing these methods. These systems are designed to process queries, identify key elements within them, and present results in a more contextually relevant and user-friendly manner.

Claims

The claims of the patent US11868357B2 focus on specific methods and systems for enhancing the ranking and presentation of search results. Here’s a summary of the key claims:

  • Entity Reference Identification: The patent claims a method for identifying an entity reference from a search query using processors. This involves understanding the specific entities (like people, places, or things) that the query is referring to.
  • Knowledge Graph-Based Ranking: It claims a system for ranking properties associated with the type of entity reference identified, based on data stored in a knowledge graph. This approach uses the interconnected information in the knowledge graph to determine how search results should be ranked.
  • Presentation Technique Determination: The patent includes a claim for determining a presentation technique for search results. This technique is based on the ranked properties and is tailored to the type of entity reference and the nature of the search query.
  • Modifying Concept Analysis: Another claim involves identifying modifying concepts (such as superlatives or qualitative aspects) in the search query and using these to influence the ranking of search results.
  • Query Tree Generation and Use: The patent claims a method for generating a query tree based on the search query and using this tree to retrieve and rank search results from the knowledge graph.
  • Ranking Rule Determination: It claims a method for determining a rule for ranking search results based on the modifying concept and the information obtained from the knowledge graph.
  • Comprehensive Search System: The patent includes claims for a search system comprising one or more computers configured to perform these operations, highlighting the technical and computational aspects of the invention.

For search queries such as “the tallest building”, it is useful to output a listing of the entities with the largest values for the “height” property.

In an example, where search query block 102 includes the search query “Tallest Building,” the search system may retrieve a collection of buildings from data structure block 104 and/or webpages block 110, determine that the sorting property is “Height,” and may output a ranked list of buildings by height to ranked search results block 108.

In addition, it is pointed out that it is partly necessary to access data from a structured database in order to create the search results. This can be a knowledge graph, for example.

„Data structure block 104 includes a data structure including piece of information defined in part by the relationships between them. In some implementations, data structure block 104 includes any suitable data structure, data graph, database, index, list, linked list, table, any other suitable information, or any combination thereof. In an example, data structure block 104 includes a collection of data stored as nodes and edges in a graph structure. In some implementations, data structure block 104 includes a knowledge graph.“

The graphic from the patent has similarities to a graphic I created showing the interaction between the classic search index and the Knowledge Graph. In this graphic the interface between the two databases is called “Processing Block”. In my graphic I call it Entity Processing that could be built on the foundation of hummingbird.

The Processing Block is used to create entity references to the search query. This is done by Natural Language Processing.

„The search system determines an entity reference from the search query by parsing, by partitioning, by using natural language processing, by identifying parts of speech, by heuristic techniques, by identifying root words, by any other suitable technique, or any combination thereof. In some implementations, the entity reference includes text or other suitable content referencing any suitable topic, subject, person, place, thing, or any combination thereof.

The modifiers shown in the graph can be e.g. superlatives like best, oldest, highest ….

More about this in my article HOW DOES GOOGLE UNDERSTAND SEARCH TERMS BY SEARCH QUERY PROCESSING?

An entity reference is the concept to a real world thing. The Processing Block creates a list of ranked properties of the entity.
Additionally, the entity’s properties can be enriched with other formats such as links, images, and videos.

„In some implementations, ranked search results, presentation techniques, or both, are output to ranked search results block 108. In some implementations, search results include, for example, entities from data structure 104, other data from data structure 104, a link to a web page, a brief description of the target of the link, contextual information related to the search result, an image related to the search result, video related to the search result, any other suitable information, or any combination thereof.“

If a search query can refer to multiple entity references, a Popularity Score per entity is taken into account. The most popular entity is prioritized in the delivery of search results.

„In some implementations, the search system selects one of the more than one identified entity references based on a global popularity score of that entity reference, a relevance and/or closeness to some or all elements of a search query, user input, user history, user preferences, relationships between the entity references as described in a data structure, any other suitable information, or any combination thereof. 

Read more in my articles How Google creates knowledge panels (SEL) and  KNOWLEDGE PANELS & SERPS FOR AMBIGUOUS SEARCH QUERIES

It is exciting that the patent describes that not only entities, but also complete lists can be stored in the Knowledge Graph, which can then be delivered directly upon search query.

„In some implementations, the ranked list of properties is stored in a data structure such as a knowledge graph, in a database, in any other suitable data storage arrangement, or any combination thereof. In some implementations, a schema table is preprocessed. In some implementations, the ranked list is predetermined, is based on the received search, or any combination thereof.

The ranking of the lists can be based on the following:

  • Popularity
  • search history
  • User habits
  • Input from developers
  • Trends in general search behavior
  • Recent search patterns
  • Content
  • Domain related ranking

This ranking takes place in the Processing Block or Entity Processing in my words.

The relationships between entities can be established using a “phrase tree”. The phrase tree is a theoretical construct that represents the relationships between entities.

In essence, the claims of this patent cover a range of methods and systems for improving how search results are ranked and presented, with a strong emphasis on the use of knowledge graphs, analysis of search queries, and the development of tailored presentation techniques.

Implications for SEO

  1. Emphasis on Entity-Based SEO: With the use of knowledge graphs, SEO strategies should focus more on entity-based optimization. This involves understanding and leveraging the relationships between different entities (people, places, things) and how they are interconnected in a knowledge graph.
  2. Importance of Contextual and Semantic Analysis: SEO needs to go beyond keyword optimization to include contextual and semantic analysis of content. Understanding the underlying meaning and context of search queries will become increasingly important.
  3. Optimization for Modifying Concepts: The patent highlights the importance of modifying concepts (like superlatives) in search queries. SEO strategies should consider how these modifiers can affect search intent and relevance, and optimize content accordingly.
  4. Tailored Content Strategies: Since the presentation of search results can vary based on query types and entities, creating content that is tailored to specific user intents and query contexts will be crucial. This means developing a variety of content types that cater to different search scenarios.
  5. Enhanced User Experience Focus: SEO strategies should prioritize user experience, ensuring that content is not only relevant but also presented in a way that aligns with user expectations and search contexts.
  6. Adapting to Dynamic Ranking Factors: As search engines evolve to use more dynamic and sophisticated ranking methods, SEO professionals need to stay informed and adapt their strategies to these changes. This includes understanding how new technologies like AI and machine learning are applied in search algorithms.
  7. Structured Data and Schema Markup: Utilizing structured data and schema markup becomes more important, as they help search engines understand the context and relationships of the content, aligning with how knowledge graphs are utilized.
  8. Long-Tail Keyword Optimization: Given the focus on specific query types and entities, optimizing for long-tail keywords that are more conversational and specific can be beneficial.
  9. Monitoring and Analytics: Continuous monitoring of search trends, algorithm updates, and analytics will be key to understanding how changes in search engine technologies and methodologies are impacting SEO performance.

In summary, this patent suggests a shift towards more sophisticated and context-aware SEO practices, emphasizing the importance of understanding user intent, leveraging entity relationships, and adapting to advanced search technologies.

Leveraging Semantic and Lexical Matching to Improve the Recall of Document Retrieval Systems: A Hybrid Approach

This work provides valuable insights into improving document retrieval systems and underscores the potential of combining semantic and lexical models to enhance efficiency and effectiveness in the retrieval stage.

  • Saar Kuzi (University of Illinois at Urbana-Champaign)
  • Mingyang Zhang, Cheng Li, Michael Bendersky, Marc Najork (Google Research)

Marc Najork is perhaps the most interesting Google engineer in terms of ranking relevant developments. Here is an overview of the most interesting research papers and patents from this guy >>> Most interesting Google patents and research papers for ranking by Marc Najork

Background

  • Search engines often use a two-phase paradigm: In the first phase (retrieval stage), an initial list of documents is retrieved, and in the second phase (re-ranking stage), the documents are re-ranked to produce the final result list.
  • There is little literature on using deep neural networks to improve the retrieval stage, although their benefits for the re-ranking stage have been demonstrated.
  • Semantic matching goes beyond simple keyword matching by understanding the complex relationships between words, thus capturing the meaning or context of the query and the documents. This is crucial for retrieving relevant documents that may not contain the exact query terms but are related in context.
  • The approach is based on the premise that effective semantic models, especially in recent years, have been largely developed using deep neural networks. These models can understand the nuances of language, including synonyms, related terms, and context, which are often missed by lexical models.
  • The semantic retrieval model is built upon deep neural networks, specifically leveraging architectures like BERT (Bidirectional Encoder Representations from Transformers). BERT and similar models are pre-trained on vast amounts of text data, allowing them to understand complex language patterns and semantics.
  • For the retrieval process, the model generates embedding vectors for both queries and documents. These embeddings represent the semantic content of the text in a high-dimensional space, where the semantic similarity between a query and a document can be measured, typically using cosine similarity.
  • Improved Recall: By capturing the meaning behind the words, the semantic approach can retrieve a broader range of relevant documents, including those that do not share exact keywords with the query. This is particularly useful for addressing the vocabulary mismatch problem, where the query and relevant documents use different terms to describe the same concept.

  • Complementarity: The semantic model complements the lexical model by covering the gaps left by keyword-based retrieval. It can identify relevant documents that are semantically related to the query but would be missed by a purely lexical search.

Challenges

  • The retrieval stage aims to maximize the recall of retrieved relevant documents. Since this stage is performed against all documents in the collection, efficiency is a major requirement.
  • A retrieval based solely on a lexical model is likely not optimal, as it may fail to retrieve relevant documents that contain none of the query terms.
  • Recall vs. Precision: Semantic models tend to have lower precision at the top ranks compared to lexical models, as they retrieve a broader set of documents. The hybrid approach proposed in the paper aims to combine the strengths of both semantic and lexical models to improve overall recall without significantly compromising precision.
  • Efficiency: Running complex neural models for every query in real-time can be computationally expensive. The paper addresses this by using efficient techniques like approximate nearest neighbor (ANN) search to quickly find the most relevant document embeddings for a given query embedding.

Solutions

  • Hybrid Approach: Combining deep neural network models and lexical models for the retrieval stage. This approach leverages both semantic (based on deep neural networks) and lexical (keyword matching-based) retrieval models.
  • Parallel Execution: Semantic and lexical retrievals are performed in parallel, and the two result lists are merged to create the initial list for re-ranking.
  • Weak Supervision Learning: Designing weakly supervised learning tasks to learn domain-specific knowledge for a new search scenario without needing access to large query logs.

Learnings

  • The empirical analysis demonstrates that the semantic approach can retrieve a large number of relevant documents not covered by the lexical approach.
  • By using a simple unsupervised approach for merging the result lists, significant improvements in recall can be achieved.
  • An exploration of the different characteristics of the semantically and lexically retrieved documents highlights the complementary nature of the two approaches.

Conclusions

  • The proposed hybrid document retrieval approach, leveraging lexical and semantic models, is efficient enough to be deployed in any commercial system.
  • The study emphasizes the importance of combining semantic and lexical approaches to improve recall in the retrieval stage and offers an effective end-to-end approach for weak supervision training in this phase.

Implications for SEO

The move towards semantic search represents a shift from a keyword-centric approach to a more nuanced understanding of content and user intent. For SEO professionals, this means adapting strategies to focus on semantic relevance, content quality, and the broader context in which search queries are made. By aligning SEO practices with the principles of semantic search, it’s possible to create more effective, user-centered content strategies that perform well in modern search engines.

  • Content Creation: The shift towards semantic search emphasizes the importance of creating content that is not just keyword-focused but also semantically rich and contextually relevant. This means focusing on topics, entities, and their relationships, rather than merely incorporating specific keywords.
  • Keyword Research: While traditional keyword research remains important, there’s a growing need to understand the broader topics and user intent behind search queries. Tools that provide insights into related topics, questions, and semantic relationships between terms will become increasingly valuable.
  • Understanding User Intent: SEO strategies must prioritize understanding the intent behind search queries. This involves categorizing content to match different stages of the user journey (informational, navigational, transactional) and ensuring that content addresses the underlying questions or needs.
  • Content Depth and Quality: High-quality, in-depth content that covers a topic comprehensively is likely to perform better in a semantic search landscape. This aligns with the E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness) principles, as content that demonstrates expertise and covers related semantic fields is more likely to be deemed relevant.
  • Enhancing Semantic Understanding: Implementing structured data using schema markup helps search engines understand the context and semantics of the content on a webpage. This can improve content visibility for relevant queries, especially for voice search and conversational queries where semantic matching is crucial.
  • Rich Snippets and SERP Features: Proper use of schema markup can also lead to enhanced presentations in search results (e.g., rich snippets, knowledge graphs) which can improve click-through rates and visibility.
  • Semantic Link Building: The importance of link building remains, but there’s a shift towards creating semantically related links. This means focusing on linking from and to content that is contextually relevant, enhancing the semantic network around topics.
  • Internal Linking: Effective internal linking strategies that help search engines and users discover related content can reinforce the semantic relevance of content, improving site structure and SEO performance.

Providing knowledge panels with search results

This patent  with the identifier US11836177B2 is related to search engines and semantic search. This patent was published December 2023. It is published only for US.  Inventor is  Jeromy William Henry.

The patent focuses on methods, systems, and apparatus for integrating knowledge panels into search results. These knowledge panels are designed to present information about factual entities referenced in a search query, enhancing the user’s search experience by providing quick access to relevant information.

Process

  1. Obtaining Search Results: The system first obtains search results responsive to a user’s query.
  2. Identifying Factual Entity: It identifies a factual entity (like a person, place, or event) referenced by the query.
  3. Content Identification for Knowledge Panel: The system identifies content for display in the knowledge panel, sourcing from multiple resources.
    1. Diverse Source Integration:
      • The system is designed to aggregate content from multiple resources. This means that the knowledge panel doesn’t rely on a single source for its information.
      • For instance, a knowledge panel about a historical landmark might include an image from one website and factual details from another.
    2. Quality and Relevance of Content:
      • The selection of content is likely based on the quality and relevance of the information it provides. This implies a preference for authoritative and credible sources.
      • The system may use algorithms to evaluate the trustworthiness and accuracy of the content from different sources.
    3. User Search Behavior and Interaction:
      • The patent suggests that the choice of content could be influenced by user search behavior. This means that popular or frequently accessed information about an entity might be prioritized.
      • User interactions with the knowledge panel could further refine the content selection, tailoring it to what users find most useful or engaging.
    4. Entity-Specific Content Selection:
      • The system tailors the content based on the type of entity. For example, for a famous person, the panel might include photos, a brief biography, and notable facts.
      • This entity-specific approach ensures that the knowledge panel is relevant and provides a comprehensive overview of the subject.
    5. Dynamic Content Adaptation:
      • The knowledge panels are not static; they can adapt and change based on new information or changing user interests.
      • This dynamic nature means that the choice of resources can evolve over time, maintaining the relevance and accuracy of the information presented.
  4. Presentation: The identified search results and the knowledge panel are presented on a search results page, with the knowledge panel alongside the search results.

Factors

  • Content Variety: The content in a knowledge panel includes items like images, titles, facts, etc., obtained from diverse resources.
  • User Interaction: The knowledge panel may include interactive elements, allowing for expanded content based on user interactions.
  • Entity Types: The system can handle multiple entity types, like persons or places, and tailor the knowledge panel accordingly.
  • Template-Based Display: Knowledge panels are generated using templates based on the type of entity.

Implications for SEO

  1. Emphasis on Entity-Based Search: SEO strategies should focus on optimizing content for specific entities (people, places, events) to be featured in knowledge panels.
  2. Rich Content Diversity: Diverse and rich content types (images, facts, interactive elements) become crucial for visibility in knowledge panels.
  3. Quality and Authority of Sources: High-quality, authoritative sources are likely favored for content in knowledge panels, emphasizing the need for credible and well-referenced content.
  4. Building Authority: Websites should aim to become authoritative sources in their niche, increasing the likelihood of their content being featured in knowledge panels.
  5. User Engagement: Interactive elements in knowledge panels suggest a shift towards more engaging content that can prompt user interaction.
  6. Content Diversity and Richness: SEO strategies should focus on creating diverse and rich content that could be sourced for knowledge panels.
  7. Monitoring User Behavior: Understanding what users frequently search for and engage with can help in tailoring content to be more relevant for knowledge panels.

Providing search results based on a compositional query

This patent  with the identifier US11762933B2 is related to search engines and especially search query processing. This patent was first published in September 2022 by Google and published in August 2023. It was published for for US, Europe, China.  Inventors are Jinyu Lou, Ying Chai, Chen Ding, Lijie Chen, Liang Hu, Kelja Liu, Weibin Pan, Yanlai Huang, David Francois Huynh.

The patent introduces a method to provide search results based on a compositional query. This involves determining entity types and relationships from the query, identifying nodes in a knowledge graph, and comparing attribute values to determine resultant entity references. The system can effectively handle queries that involve relative relationships between different entity types, providing more relevant and contextual search results.

The patent discusses a technique for providing search results. This technique involves:

  1. Determining a first entity type, a second entity type, and a relationship type based on a compositional query.
  2. Identifying nodes of a knowledge graph corresponding to entity references of the first and second entity types.

  1. Determining an attribute value from the knowledge graph corresponding to the relationship type for each entity reference of the first and second entity types.
  2. Comparing the attribute value of each entity reference of the first entity type with the attribute value of each entity reference of the second entity type.
  3. Determining resultant entity references from the entity references of the first entity type based on the comparison.

Compositional queries, as mentioned in the patent and generally in the context of search and information retrieval, refer to queries that involve multiple entity types and their relationships. Instead of focusing on a single keyword or entity, compositional queries aim to understand and provide results based on the relationships between different entities mentioned in the query.

Here’s a breakdown:

  1. Multiple Entity Types: These queries involve at least two types of entity references. An entity can be anything that is singular, unique, well-defined, and distinguishable, such as a person, place, item, idea, etc.
  2. Relative Relationships: The entities in the query are related by some form of relative relationship. This relationship can be spatial, temporal, or any other kind of relation that connects the entities in a specific way.

Examples:

“American Banks close to Japanese restaurants”: This query involves two types of places (banks and restaurants) and indicates a relative spatial relationship (close to) without specifying a particular bank or restaurant.
“Companies that went bankrupt during an economic crisis”: This query involves companies and economic crises, with a temporal relationship (during) connecting them.
In the context of the patent, the system aims to handle these compositional queries by determining the entity types and relationships from the query, identifying relevant nodes in a knowledge graph, and then comparing attribute values to provide the most relevant search results.

For SEO and content creators, understanding compositional queries means recognizing the importance of context and relationships in content, as search engines move towards handling more complex queries that involve multiple entities and their interrelations.

Implications for SEO:

  1. Complex Query Handling: SEO professionals need to be aware that search engines might be moving towards handling more complex queries that involve relationships between different entities.
  2. Knowledge Graph Optimization: As the patent emphasizes the use of a knowledge graph, optimizing content to be recognized and categorized correctly within such graphs becomes crucial.
  3. Entity Recognition: Content should be structured in a way that search engines can easily recognize and categorize different entities and their relationships.
  4. Contextual Relevance: SEO strategies should focus on ensuring content is contextually relevant, considering the search engine’s ability to understand and compare attributes of different entities.

Mapping images to search queries

This patent  with the identifier US11734287B2 is related to search engines and especially image search ranking. This patent was first filed in November 2020 by Google and published in September 2023. It was published for only for US.  Expiration date is 2036. Inventors are Matthew Sharifi, David Petrou and Abhanshu Sharma.

The patent revolves around a method and system that allows users to input a query in the form of an image. The system then identifies entities associated with the image, maps these entities to pre-associated search queries, scores these queries based on relevance, and finally outputs a representative search query in response to the image input.

Key Insights:

  1. Purpose and Application:
    • The patent is designed to process a user’s query image and provide relevant information in response to the image.

      “In general, a user can request information by inputting a query to a search engine. The search engine can process the query and can provide information for output to the user in response to the query.”

  2. Query Image Processing:
    • The system receives entities associated with the query image. These entities can be obtained from image labels, which might be fine-grained (specific landmarks, book covers) or coarse-grained (general objects like buildings).

      “The system receives one or more entities that are associated with the query image by first obtaining one or more query image labels, e.g., visual recognition results, for the query image.”

  3. Knowledge Graph Integration:
    • For the obtained image labels, the system identifies entities using a knowledge graph. This helps in associating specific entities with the image labels.

      “For one or more of the obtained query image labels, the system may then identify one or more entities that are pre-associated with the one or more query image labels, e.g., using a knowledge graph.”

  4. Scoring and Output:
    • The system scores each candidate search query based on relevance. A representative search query is then selected based on these scores and provided as an output.

      “Methods, systems, and apparatus for receiving a query image, receiving one or more entities that are associated with the query image, identifying, for one or more of the entities, one or more candidate search queries that are pre-associated with the one or more entities, generating a respective relevance score for each of the candidate search queries, selecting, as a representative search query for the query image, a particular candidate search query based at least on the generated respective relevance scores and providing the representative search query for output in response to receiving the query image.”

      Some more detailed Infos about the Scoring process:

      1. Assignment of Relevance Scores:
        • Relevance scores can be assigned to candidate search queries by another system or even by a person, such as a moderator or user of the system.

          “In some instances relevance scores may be assigned to the one or more candidate search queries by another system or assigned to the candidate search queries by a person, e.g., a moderator or user of the system.”

      2. Contextual Matching:
        • The knowledge engine determines whether the context of the received user-input query image matches a candidate search query. Based on this match, a relevance score is generated.

          “The knowledge engine 260 may determine whether a context of the received user-input query image matches a candidate search query, and based on the determined match, generate a respective relevance score for the candidate search query.”

      3. Location-based Scoring:
        • The system may consider the location associated with the query image. For instance, if a photograph of “The Gherkin” building was taken near it, the system might generate higher relevance scores for queries related to “The City of London”.

          “The knowledge engine 260 may determine that the received photograph 100 of The Gherkin was taken near in the vicinity of The Gherkin. In such an example, the knowledge engine 260 may generate higher respective relevance scores for candidate search queries that are related to The City of London.”

      4. Natural Language Query Integration:
        • If a natural language query is provided along with the query image, the system may generate relevance scores based on this query. For instance, if the image is of the “LA Lakers” logo and the text query is “buy clothing”, the system might prioritize queries like “buy LA Lakers jersey”.

          “The system may then generate respective relevance scores for the candidate search queries “LA Lakers jersey” or “buy LA Lakers jersey” that are higher than relevance scores for candidate search queries that are not related to the text “buy clothing.”

      5. Search Results Page Analysis:
        • The system can generate a search results page using the candidate search query and then analyze this page to determine its interest and usefulness. Based on this analysis, a relevance score is assigned.

          “In other examples, the knowledge engine 260 may generate respective relevance scores for each of the one or more candidate search queries by generating a search results page using the candidate search query and analyzing the generated search results page to determine a measure indicative of how interesting and useful the search results page is.”

      6. Popularity of the Search Query:
        • The system may consider the popularity of a candidate search query. A query that has been issued more times might receive a higher relevance score.

          “In other examples, the knowledge engine 260 may generate respective relevance scores for each of the one or more candidate search queries by determining a popularity of the candidate search query.”

      7. User Activity Association:
        • The system can determine a user’s current activity associated with the received image and adjust relevance scores accordingly. For instance, if the user’s current activity is determined to be shopping, and they submit an image of hiking boots, the system might prioritize queries related to nearby hiking trails.

          “In further implementations, generating a respective relevance score for each of the candidate search queries may include determining a user activity associated with the received image.”

      In essence, the relevance scoring mechanism is multifaceted, taking into account various factors like the context of the image, associated location, natural language queries, search results page analysis, query popularity, and user activity. This ensures that the system provides the most pertinent and relevant search queries in response to an image input.

  5. Example of Operation:
      • An example provided in the patent mentions a query image of a building named “The Gherkin”. The system identifies the building and provides a representative search query like “What style of architecture is The Gherkin?”.

        “The knowledge panel 118 provides general information relating to the entity “The Gherkin,” such as the size, age and address of the building. The list of search results 116 provides search results responsive to the representative search query “What style of architecture is The Gherkin?”

This patent showcases an innovative approach to search queries, allowing users to use images as queries and receive relevant textual search queries in return. It integrates visual recognition with search engine capabilities to enhance user experience.

Combining content with a search result

This patent  with the identifier US11727046B2 is related to search engines and especially SERP features. This patent was first filled in October 2022 by Google and published in August 2023. It was published for only for US.  Expiration date is 2033.

Background:

The patent pertains to the domain of information presentation, particularly in the context of the internet. The internet provides access to a plethora of resources, such as video/audio files, webpages on specific subjects, news articles, etc. Given this vast access, there are opportunities to provide other content (like advertisements) alongside these resources. For instance, a webpage might have designated slots where additional content can be displayed. These slots can either be predefined in the webpage or can be defined for presentation alongside the webpage, especially with search results.

Quote: “The Internet provides access to a wide variety of resources. For example, video and/or audio files, as well as webpages for particular subjects or particular news articles, are accessible over the Internet. Access to these resources presents opportunities for other content (e.g., advertisements) to be provided with the resources.”

Core Concept:

The patent discusses methods, systems, and apparatus that include computer programs encoded on a computer-readable storage medium. The primary function is to provide content. When a user queries a search engine, search results are identified. Among these results, there might be a top set of results associated with a specific entity. Alongside these search results, an eligible content item (like an advertisement or additional information) associated with the same entity is identified. The system then combines the search result and the eligible content item to present it as a single search result in response to the query. This combined content item can further be augmented by identifying related entities and content items associated with them.

Quote: “Methods, systems, and apparatus include computer programs encoded on a computer-readable storage medium, including a method for providing content. Search results responsive to a query are identified… A combined content item is identified that is a combination of the first search result and first eligible content item and is to be presented as a search result responsive to the query.”

Key Insights:

  1. Content Augmentation: The patent emphasizes the augmentation of combined content items. This means that once a primary content item (like a search result) is combined with an eligible content item (like an advertisement), this combined content can be further enhanced by pulling in related content.
  2. Entity Association: The system identifies content based on its association with specific entities. For instance, if a search result is related to a particular brand or topic, the system will look for eligible content items (like advertisements) that are also associated with the same brand or topic.
  3. Enhanced User Experience: By combining relevant search results with associated content items, the system aims to provide a richer and more informative user experience. Instead of viewing search results and advertisements separately, users get a consolidated view where the primary content and the additional content are seamlessly integrated.

In simpler terms, imagine searching for a product on a search engine. Instead of just getting a link to the product’s official website, you might also see a special offer or advertisement related to that product directly combined with the search result.

This patent by Google aims to enhance the way users receive and view information on the internet, making the experience more integrated and contextually relevant.

Contextualizing knowledge panels

This patent  with the identifier US11720577B2 is related to search engines and especially semantic search. This patent was first filled in January 2022 by Google and published in August 2023. The patent is a republishing and was first published in 2016. It was published for US, South Corea and WIPO. This means that it is more likely to be used in practice. Expiration date is 2038.

The patent describes a system designed by Google that provides users with a “knowledge panel” when they search for entities, like singers, actors, writers, etc. This knowledge panel provides relevant and detailed information about the searched entity based on the context of the search query.

Key Insights:

Entity and Context:

  • The system can receive search requests that include the name (or identifier) of an entity and additional context terms from the user’s search query.

“A system can receive requests including identifiers of entities… as well as context terms that are referenced by a search query submitted by a user.”

Knowledge Elements:

  • The system identifies multiple “knowledge elements” related to the searched entity. Knowledge elements can be facts or pieces of content related to the entity.

“Identifying a plurality of knowledge elements that are related to the entity;”

Ranking and Selection:

  • The system will rank these knowledge elements based on their relevance to the context terms from the search query.

 Each predicted set may have an associated confidence score or probability score indicating a level of certainty that the features provided predict the particular fixed set of entities. In some implementations, the confidence score may be based on a similarity measure which may differ depending on the type of set. For example, the confidence score for a location-based sets may be based on physical distance from a specified location, e.g., the current location of a computing device. The confidence score for a topic-based set may be based on an embedding distance with a query, such as an embedding generated based on signals from a client device. Such signals can include text recently seen on the screen, the state or proximity of external devices, content of recent searches, stated user interests, an application installed or executing on the client device, a time stamp, etc.,

  • It then selects the most relevant knowledge elements to display.

“Assigning, by one or more computers, rank scores to the plurality of knowledge elements… selecting one or more of the knowledge elements from among the knowledge elements based at least on the rank scores assigned to the knowledge elements;”

Presentation in the Knowledge Panel:

  • The system can decide how to present the selected knowledge elements in the knowledge panel. This includes determining the position of the knowledge panel on the search results page, the number of knowledge elements to show, their position within the panel, and even specific features like highlighting text or deciding on a title or subtitle.

“Providing information associated with the entity and the one or more selected knowledge elements comprises providing data that causes information associated with the entity and the one or more selected knowledge elements to be presented in a knowledge panel, the knowledge panel being presented with a search results page associated with the search query.” “In some implementations providing data that causes information associated with the entity and the one or more selected knowledge elements to be presented in the knowledge panel comprises determining, based on identifying the one or more context items that are associated with the entity that is referenced by the search query, a number of knowledge elements to select for presentation in the knowledge panel;”

Customization:

  • The system can customize the knowledge panel based on the context of the search. This could include choosing which information to highlight, or determining a suitable title or subtitle for the displayed information.

“In certain aspects, providing data that causes information associated with the entity and the one or more selected knowledge elements to be presented in the knowledge panel comprises determining, based on identifying the one or more context items that are associated with the entity that is referenced by the search query, a title or subtitle relating to one or more of the selected knowledge elements presented in the knowledge panel…”

In essence, this patent reveals how Google aims to enhance its search results by presenting users with a tailored “knowledge panel” that provides in-depth information about searched entities, adjusted and ranked based on the context of the user’s search query.

Content selection and presentation of electronic content

This patent  with the identifier US11663277B2 is related to search engines and especially News. This patent was first filled in May 2021 by Google and published in August 2023. It was published for US, Canada and WIPO. This means that it is more likely to be used in practice. Expiration date is 2038.

The patent deals with a method and system for populating an interest feed with electronic news article resources. It is obviously related to Google Discover.

Method for Populating an Interest Feed:

  • The system looks for electronic news articles that reference two entities (first entity and second entity) with a significant relevance.
  • If enough articles reference both entities, it’s inferred that there is an ongoing event involving both entities.
  • A representation (like a news summary or highlight) of this event is generated using content from the articles and other resources.
  • If a user’s interest list includes the first entity but doesn’t already have the event, this representation is provided to the user’s device.

Scoring and Selection:

  • Each article is assigned scores based on how much they relate to the first and second entities.
  • The system can create a “superset” of articles that mention both entities. From this, a subset is selected where each article references both entities with significant relevance.

Event Determination:

  • Multiple possible events involving the two entities are identified.
  • A filtering algorithm is used to shortlist the most likely events based on the articles.
  • The type of the likely event can be determined using the articles.

Content Generation:

  • The event representation includes content that describes the ongoing event.
  • Additional content is generated based on other resources which relate to some attribute of the activity in the event.

Attributes and Additional Resources:

  • Attributes might include the type of activity, the industry of the activity, a connection to a previous event, or correlation with another user’s interest list.
  • Based on the correlation between the first and second entities, the system selects additional resources to provide more context or information.

User Search Queries:

  • The system can also identify the first entity based on search queries from multiple users.

Key Insights with Quotes:

  1. Interest Feed Generation:
    • The patent emphasizes creating a personalized feed for users based on trending events related to their interests.

      “A method for populating an interest feed with electronic news article resources…”

  2. Relevance Threshold:
    • The system relies on a threshold of relevance to determine which articles are significant.

      “…that a threshold quantity of electronic news article resources each reference both a first entity and a second entity with at least a threshold magnitude of relevance…”

  3. Event Representation:
    • The system generates a summarized or highlighted representation of the detected event for the user.

      “generating a representation that corresponds to the event…”

  4. Additional Content:
    • Content isn’t just derived from news articles. The system will seek other resources that relate to the event’s activities.

      “…generating second content that is based on an additional resource…”

  5. User’s Interest List:
    • Users receive content based on their specific interests, enhancing the personalization of their feed.

      “identifying a user account that includes an interest list that includes the first entity but that does not include the determined event…”

In essence, the patent outlines a sophisticated method for curating a personalized news feed for users, driven by electronic news articles, relevance scoring, event detection, and additional resources. It sounds for me like a description of a system like Google Discover.

Structured entity information page

This patent  with the identifier US11706318B2 is related to search engines and specifically to semantic and entity based search. This patent is a continuation of former patents and was first assigned 2015 by Google and newly published in July 2023. It was published for US, Germany and WIPO. This means that it is more likely to be used in practice. Expiration date is 2036.

The patent describes a method performed by a server system to generate and display a structured information page associated with an entity. When the server system receives a request from a client device for this information page, it identifies historical user activity related to the entity. The server system then generates the information page by formatting it according to predefined and dynamically selected information types. The relative importance of these candidate information types is determined by the server system. The information page is populated with the identified information and transmitted to the client device for display. Additionally, the structured information page may include primary and secondary colors associated with the entity.

Important Insights:

  • Structured Information Page Generation: The patent emphasizes a method where the server system generates a structured information page related to a specific entity.

Quote: “The patent describes a method performed by a server system to generate and display a structured information page associated with an entity.”

  • Historical User Activity: The system identifies historical user activity related to the entity to tailor the information page.

Quote: “The server system receives a request from a client device for the information page and identifies historical user activity related to the entity.”

  • Dynamic Formatting: The information page is formatted based on both predefined and dynamically selected information types.

Quote: “The server system then generates the information page by formatting it according to predefined and dynamically selected information types.”

  • Relative Importance of Information: The server system determines the relative importance of candidate information types to populate the information page.

Quote: “The relative importance of the candidate information types is determined by the server system.”

  • Color Association: The structured information page may have colors associated with the entity, potentially for branding or recognition purposes.

Quote: “The structured information page may also include primary and secondary colors associated with the entity.”

It is obvious this patent describes the methodology of serving Knowledge Panels at Google.

Both the Knowledge Panel and the patent emphasize presenting structured information about entities. While the Knowledge Panel provides summaries and key details about a topic, the patent’s method also focuses on creating a structured page with relevant information about an entity. The patent mentions the use of historical user activity to tailor the information page. Similarly, Google’s Knowledge Panel might prioritize certain information based on user behavior and search trends. (more to this topic in my article on SEL How Google creates knowledge panels).

Both systems are centered around entities. Whether it’s a person, place, organization, or thing, the goal is to provide users with a comprehensive and organized view of the topic.

The patent mentions the inclusion of primary and secondary colors associated with the entity. This is reminiscent of how Google’s Knowledge Panel sometimes includes branding or recognizable images/colors associated with the entity being searched.

In both cases, a server system processes the request and sends the structured information to the client device (usually a user’s browser) for display.

The patent US10110701B2 can be seen as a technical embodiment of some of the concepts behind Google’s Knowledge Panel. While the patent provides a method for generating structured information pages based on user activity and predefined information types, Google’s Knowledge Panel serves as a real-world application that offers users structured, relevant, and concise information about their search queries. The patent might be one of the many technological backbones that support features like the Knowledge Panel, ensuring that users receive the most relevant and structured information for their queries.

Surfacing unique facts for entities

This patent was first drawn by Google in 2016 and renewed in January 2023. Since the patent has been filed in the USA, Europe, China and worldwide, it is likely that Google will use it in practice.

The patent describes systems and methods for identifying and providing interesting facts about an entity. The inventors are Akash Nanavati, Aniket Ray and Torsten Rohlfing, and the applicant is Google Inc . The patent was published on January 31, 2023 .

The patent is about the extraction of facts from unstructured data. (more about this also in the articles How Google can identify and interpret entities from unstructured content and Natural Language Processing to build a semantic database.) and serving facts about entities in the SERPs.(more about in the article How does Google understands search terms by search query processing? )

The patent describes systems and methods for identifying and providing interesting facts about an entity. An example method includes selecting documents associated with at least one unique fact trigger from a document repository.

The method also includes generating entity-sentence pairs from the documents and, for a main entity of the entities represented by the entity-sentence pairs, clustering the entity-sentence pairs for the main entity using salient terms that occur in the sentence.

This means that the entity-sentence pairs for a given entity are clustered based on the salient terms that occur in the sentences. The goal of clustering is to group similar entity-sentence pairs together to identify the most relevant and interesting facts about the entity.

“In some implementations, the unique fact finder 115 may filter out sentences that are likely already represented in the knowledge base 190 as structured facts. For example, sentences that match certain patterns, such as “X is friends with,” “X is married to,” or “X was born on” where X represents the entity mention, may be removed from the entity-sentence pairs because these sentences do not likely represent unique facts. Rather, such sentences represent structured facts. The patterns for identifying sentences that are likely structured facts may be hand curated and stored as part of the system 100.”

Which entity-sentence pairs are selected can be based on the topicality of the source document, pagerank of the document, length of the sentence, number of characters or a promotion factor of the source document that is based on links, among other things.

“Another factor may be a promotion factor that measures the fun-quotient of the source document. For example, the more inbound links for the source document that include whitelisted trigger phrases or synonyms of the whitelisted trigger phrases, the higher this promotion factor is. “

A low IDF score or rating by a rater can also be a factor.

“The IDF score represents how rare a term is across a corpus of documents, thus terms that occur less frequently across the corpus have a higher IDF score than very common terms. The IDF score for a sentence may be a demotion factor where low IDF scores represent a demotion. In some implementations, one or more of the entity-sentence pairs may be rated by an external rater (a human) for an interestingness factor.”

The topicality of a document in relation to the main entity describes the importance of the entity for the document.

“For example, if the source document for the sentence about cat urine is about cats, the entity cat will have a high document topicality score and urine will be lower. If, however, the document is about urine, the urine entity will have a higher document topicality score.”

In addition, a Semantic Importance Score can be taken into account, which measures the topicality of the entity for the sentence.

“For example, the source document may be about cats, but a sentence may compare a unique fact about a dog to cats, e.g., “a dog’s sense of smell is 100× more sensitive than a cat’s”. The semantic importance score of the The topicality score for an entity-sentence pair may be determined based on the document topicality score, the semantic importance score, or a combination of these.”

The frequency of facts mentioned in relation to an entity seems to be an indication of correctness.

“Clustering enables the system to avoid showing duplicate or near duplicate facts in a search result and enables the system to accumulate support across sentences expressing the same fact, which is an indication of a fact’s correctness and uniqueness.”

Similar terms can be assigned to a cluster via lemmatization, a sub-step of natural language processing.

The method also includes determining a representative set for each of the clusters and providing at least one of the representative sets in response to a query identifying the main entity.

Google could use these representative sentences for the knowledge panel or a kind of featured snippet.

Another example method includes determining that a query relates to an entity in a knowledge base, determining that the entity has an associated unique fact list, and providing at least one of the unique facts in the list in response to the search query.

The patent also speaks explicitly of a document repository, which is obviously a classic search index.

“For example, document repository 195 may include an index that stores terms or phrases that appear in the documents, as well as the content of the documents or a pointer to the content. In some implementations the document repository 195 represents documents available over the Internet .”

There are also some interesting statements in the patent about the general selection of sources. This describes how blogs or forums are used less as a source for facts, as they are more about opinions than facts.

Sources that stand out due to duplicate content or replication are also not considered as sources.

“Likewise documents classified as blogs or forums may be considered low-quality. Blogs and forum are likely to include more opinions than facts and less likely to have reliable facts. The system may also consider documents classified as syndicated or plagiarized as low quality. The content of documents classified as syndicated or plagiarized is duplicated from other documents.For example, a web site may be a collection of news stories from news organizations.The system may consider such documents as lacking original content and, therefore, low-quality.Another criteria used by the system to identify low-quality documents may be blacklisting.For example, a document or a domain may be added to a list and any documents in the list (e.g., specifically identified or matching a domain) are considered low-quality . Such a list may be manually curated. The system may ignore low-quality documents so that they are never considered as unique fact sources.”

Fact triggers can be unique unusual information. Trigger terms can be here:

  • did you know
  • fun facts
  • Interesting Facts

Link texts with these terms can also be beneficial. While these terms can promote whitelisting of the documents, there are
words like

  • Lie
  • myths

that may encourage blacklisting.

It is interesting that the patent obviously talks about a Knowledge Graph, which assigns text or other information to entities in addition to attributes. (I described this in more detail in the article ….).

“In some implementations, the knowledge base 190 may be a data graph, where entities are stored as nodes and facts are stored as relationships between entities or attribute-value pairs for the entities. The edges may be labeled edges and the labels may represent thousands or hundreds-of-thousands of different facts. As used herein, entity may refer to a physical embodiment of a person, place, or thing or a representation of the physical entity, e.g., text, or other information that refers to an entity.”

This patent give some interesting insights how Google could identify and serve facts about entities and show that a knowledge graph is still important for Google.

Dynamic Injection of related content in search results

This Google patent was published 07.05.2022 and filed on 06.08.2020. This Google patent is for me one of the most exciting in 2022. It is only registered in the US and China. It is therefore unlikely that it is currently in international use. But still exciting!

It describes a methodology how a search engine automatically suggests further links and search query alternatives within a box based on the dwell time in the SERPs. The appearance of these suggestions is reminiscent of the “others also searched for” suggestions when you return to the SERP after clicking on a search result.

It seems to be oriented to this functionality and to integrate more suggestions like links into the SERP. The difference to the already known functionality here is that not a click on a search result is the triggering event, but the dwell time.

“Implementations use a dwell signal to display related suggested items and/or to influence “next page” search results for dynamic pagination. For example, some implementations may calculate related suggestions for a search result presented in response to a query. The suggestions may include refined queries and/or links to specific items. “

If a threshold value for a dwell time is reached, a box with suggestions is automatically displayed, because it can be assumed that the user has not found what he is looking for.

The suggestions are intended to make the user direct the search queries in a slightly different direction and suggest similar content of the same category or class.

In addition, or instead, the suggestions may offer tangential suggestions that take the user in a slightly different direction, e.g., offering related queries, alternate interpretations of the query terms, and/or documents in a same category/classification as the particular search result but not highly similar to the result.

Besides links and search query refinement, the suggestions can also consist of images, videos, PDFs, audios … include. Entities can also be suggested.

In finding responsive items, the query system 120 may be responsible for searching one or more indices, represented collectively as item index 140. The item index 140 may include a web document index, e.g., an inverted index that associates terms, phrases, and/or n-grams with documents. Web documents can be any content accessible over the Internet, such as web pages, images, videos, PDF documents, word processing documents, audio recordings, etc. The item index 140 may also include an index of entities, for example from a knowledge base or knowledge graph

It is also interesting to note that suggestions can be generated based on the user journeys of other searchers.

In some implementations, the suggested follow-on queries may be related to a specific responsive item. For example, the responsive item may be associated with one or more queries, e.g., because the responsive item has been selected often after being presented as a search result for the related queries. If the responsive item has related queries these queries may be included as suggested items for the responsive item. For example, the suggested items 135 can include parts of a topic journey that other users have taken. For instance, if the current query is “jobs in Pittsburgh” the search system may suggest “housing in Pittsburgh” or “best elementary schools in Pittsburgh” as a suggested item 135.

Refinement suggestions are issued for ambiguous search queries based on other interpretations of the search query. Or in the form of terms with similar meanings, or in the form of explicit questions that illuminate a new perspective.

As another example, the suggested items 135 may include alternate interpretations of a query term. For instance, the query “jaguar” may result in “jaguar car,” “jaguar cat,” and/or “jaguar team” as suggestions. Similarly, suggested items 135 may include alternate possibilities. For example, a query of “washing machine” may have as suggested items 135 “new washing machine” or “washing machine repair” while a query of “university” may include “trade school” or “journey program” as a suggested item 135. Another example of suggestions tangential to a query are alternate viewpoints. For instance, a query of “How long should I foam roll after running?” may have as a suggested item “Should I foam roll after running?” or “Alternatives to foam rolling after running.”

In addition to the suggestions in a box, a “Next page” function can be used to offer the user to refresh the search results completely, or at least the first ten, without having to load a completely new set of hundreds of results.

The next page may include another small set of results, which may include some of the original smaller set that were not included in the first page as well as results added due to the dwell score signals. Thus, implementations may support dynamic pagination of search results and use a dwell score (or scores) to determine which search results are provided next. Dynamic pagination may be utilized irrespective of manual pagination; in other words, the user may interact with a “next page” type UI element or via automatic in-line pagination, which appends new results to the existing page.

The advantage would be a faster display of search results.

Accelerated large scale similarity calculation

This Google patent was first published in 2019 and republished on 2022-05-07.

The patent describes the process for determining a similarity of two entities based on the similarity of attributes. The degree of similarity is determined by a similarity score. The purpose of this process is to determine a response to a query.

“For example, the query might seek information indicating which domain names 20-year-old males in the U.K. find more interesting relative to the general population in the U.K. The system computes the correlations by executing a specific type of correlation algorithm (e.g., a jaccard similarity algorithm) to calculate correlation scores that characterize relationships between entities of the different datasets.”

Googles machine Learning platform Tensorflow is used as the basis for determining the similarity score.

“The system includes a tensor data flow interface that is configured to pre-load at least two data arrays (e.g., tensors) for storage at a memory device of the GPU.”

For example, a Knowledge Graph and/or the Knowledge Vault or any kind of semantic database can be used as the entity database accessed by the algorithm.

The following example from the patent shows possible entities and attributes for a comparison:

“For example, entities of one dataset can be persons or users of a particular demographic (e.g., males in their 20’s) that reside in a certain geographic region (e.g., the United Kingdom (U.K.)). Similarly, entities of another dataset can be users of another demographic (e.g., the general population) that also reside in the same geographic region.”

The exciting thing about the patent is that in addition to outputting search results, it can also be used to create groups or cohorts of similar users for Google Analytics, for example. You can find these sections in the patent:

“For situations in which the systems discussed here collect and/or use personal information about users, the users may be provided with an opportunity to enable/disable or control programs or features that may collect and/or use personal information (e.g., information about a user’s social network, social actions or activities, a user’s preferences or a user’s current location). In addition, certain data may be treated in one or more ways before it is stored or used, so that personally identifiable information associated with the user is removed. For example, a user’s identity may be anonymized so that the no personally identifiable information can be determined for the user, or a user’s geographic location may be generalized where location information is obtained (such as to a city, ZIP code, or state level), so that a particular location of a user cannot be determined.

System 100 is configured to analyze and process different sources of data accessible via storage device 106. For example, CPU 116 can analyze and process data sources that include impression logs that are interacted with by certain users or data sources such as search data from a search engine accessed by different users. In some implementations, entities formed by groups of users, or by groups of user identifiers (IDs) for respective users, can be divided into different groups based on age, gender, interests, location, or other characteristics of each user in the group.”

“In some implementations, hosting service 110 represents an information library that receives and processes queries to return results that indicate relationships such as similarities between entities or conditional probabilities involving different sets of entities. For example, a query (or command) can be “what are all the conditional probabilities associated with some earlier query?” Another query may be related to the conditional probabilities of all the ages of people that visit a particular website or URL. Similarly, another query can be “what are the overlapping URL’s visited by 30-year-old females living in the U.S. relative to 40-year-old males living in the U.K.?”

This Google patent shows that similarities and thus relationships between entities are important to Google and that attributes are the basis for determining these. It also shows that organizing around entities in terms of Internet users can also be a solution to the privacy challenges of building cohorts of similar users based on certain attributes.

About Olaf Kopp

Olaf Kopp is Co-Founder, Chief Business Development Officer (CBDO) and Head of SEO at Aufgesang GmbH. He is an internationally recognized industry expert in semantic SEO, E-E-A-T, modern search engine technology, content marketing and customer journey management. As an author, Olaf Kopp writes for national and international magazines such as Search Engine Land, t3n, Website Boosting, Hubspot, Sistrix, Oncrawl, Searchmetrics, Upload … . In 2022 he was Top contributor for Search Engine Land. His blog is one of the most famous online marketing blogs in Germany. In addition, Olaf Kopp is a speaker for SEO and content marketing SMX, CMCx, OMT, OMX, Campixx...

COMMENT ARTICLE



Content from the blog

The dimensions of the Google ranking

The ranking factors at Google have become more and more multidimensional and diverse over the read more

Interesting Google patents for search and SEO in 2024

In this article I would like to contribute to archiving well-founded knowledge from Google patents read more

What is the Google Shopping Graph and how does it work?

The Google Shopping Graph is an advanced, dynamic data structure developed by Google to enhance read more

“Google doesn’t like AI content!” Myth or truth?

Since the AI revolution, fueled by the development of large language models (LLMs) and generative read more

Most interesting Google Patents for semantic search

Since 2013 Google is enabling an entity based semantic search system parallel to the classical read more

How does Google search (ranking) may be working today

Google has disclosed information about its ranking systems. With this information, my own thoughts and read more