Author: Olaf Kopp , 22.September 2023
Reading time: 45 Minutes

Most interesting Google Patents for SEO in 2023

4.4/5 - (20 votes)

In this article I would like to contribute to archiving well-founded knowledge from Google patents.

Bill Slawski

Research Google patents is one of the smartest ways to understand modern search engines like Google. A pioneer in researching Google patents was the unforgettable Bill Slawski. He passed away in summer 2022. In his blog SEObythesea he published the insights from hundreds of Google patents and thus did an essential job for the entire SEO industry. He inspired me years ago to research Google patents by myself and to write my own thoughts and theories from Google patents.

Are the systems and methods in the patents used by Google?

A patent application does not mean that the methods described there will find its way into practice in Google search. An indication of whether a methodology/technology is so interesting for Google that it could find its way into practice can be obtained by checking whether the patent is pending only in the US or other countries. The claim for a patent priority for other countries must be made 12 months after the first filing.

Regardless of whether a patent finds its way into practice, it makes sense to deal with Google patents, as you get an indication of the topics and challenges that product developers at Google are dealing with.

Below are summaries of the most interesting Google patents from 2023 and the last years. More about Google patents in my arcticle ” The most interesting Google patents and scientific papers on E-E-A-T“.

Enjoy!

Interesting Google patents of 2023

Systems and methods for machine-learned prediction of semantic similarity between documents

This patent  with the identifier US11694034B2 is related to search engines and especially semantic search. This patent was first filled in October 2020 by Google and published in July 2023. It was published only for US.  Expiration date is 2041.

The patent revolves around systems and methods for predicting the semantic similarity between documents. The process involves:

  1. Obtaining Documents: The method starts by obtaining two documents.
  2. Parsing Documents: These documents are parsed into textual blocks.
  3. Processing with a Model: Each of these textual blocks is processed with a machine-learned semantic document encoding model to obtain document encodings.
  4. Determining Similarity: A similarity metric is determined based on these encodings, which describes the semantic similarity between the two documents.#

Insights:

Semantic Similarity Determination:

The patent describes a “semantic similarity determinator” which can be used to determine a metric that describes the semantic similarity between two documents. This metric is based on the document encodings.

[0089] The semantic similarity determinator 410 can be used to determine a semantic similarity metric 412 descriptive of a semantic similarity between the first document 403A and the second document 403B.

Use of Cosine Similarity:

One of the methods to determine this similarity is through cosine similarity between the pooled sequence output corresponding to the document encodings.

Quote: “As an example, a cosine similarity can be determined by the semantic similarity determinator 410 between the pooled sequence output corresponding to the two documents encodings cos(E(d,), E(,)).”

Flexibility in Determining Similarity:

While cosine similarity is one method, any conventional function can be used to determine the similarity between the document encodings.

Quote: “It should be noted that although a cosine similarity can be used to determine the similarity metric 412 between the first document encoding 408A and the second document encoding 408B, any conventional function can be utilized to determine a similarity between the first and second document encodings 408A/408B.”

Hierarchical Submodel Structure:

The machine-learned semantic document encoding model uses a hierarchical submodel structure. This structure allows the model to localize dependencies between textual segments, such as sentences, within a textual block or among multiple textual blocks.

Quote: “In such fashion, by utilizing a hierarchical submodel structure, the machine-learned semantic document encoding model can localize the dependencies between textual segments (e.g., sentences) to those included in a textual block and/or among textual blocks.”

Evaluating Loss Function:

The method also involves evaluating a loss function that assesses the difference between the determined similarity metric and some ground truth data associated with the documents. This evaluation aids in adjusting parameters of the semantic document encoding model.

Quote: “evaluating a loss function that evaluates a difference between the similarity metric and ground truth data associated with the first document and the second document.”
In simpler terms, this patent describes a method to determine how similar two documents are in terms of their meaning or content. It uses a machine-learned model to process the documents and determine a similarity score, which can be adjusted and refined based on actual data.

The data used for determining the similarity score between two documents is based on their respective encodings. The similarity score is derived from the encoded representations of the documents, and various methods, including cosine similarity and other conventional functions, can be used to compute this score. This score can then be employed for various tasks, including clustering and classification. Here are the insights and corresponding quotes from the patent:

Document Encodings:

The similarity metric is determined based on the encodings of the two documents.

Quote: “[0039] A similarity metric can be determined based at least in part on the first document encoding and the second document encoding (e.g., a comparison between the document encodings, etc.).”

Cosine Similarity:

Cosine similarity is one method used to compare the pooled sequence outputs corresponding to the document encodings.

Quote: “As an example, a cosine similarity can be determined between the pooled sequence outputs corresponding to the two documents cos(E(d,), E(d,)) (e.g., the first document encoding and the second document encoding).”

Conventional Functions:

Apart from cosine similarity, any conventional function can be employed to determine the similarity between the document encodings.

Quote: “It should be noted that although a cosine similarity can be used to determine a similarity metric between the first document encoding and the second document encoding, any conventional function can be utilized to determine a similarity between the first and second document encodings.”

Binary Prediction & Percentage Metric:

The similarity metric can include a binary prediction indicating whether the two documents are semantically similar. Alternatively, it can also include a predicted level of semantic similarity between the two documents, such as a percentage metric.

Quote: “For example, the similarity metric can be or otherwise include a binary prediction as to whether the first document and second document are semantically similar. For another example, the similarity metric can be or otherwise include a predicted level of semantic similarity between the two documents (e.g., a percentage metric, etc.).”
Downstream Tasks:

The similarity metric can be further utilized for various downstream tasks, such as clustering and classifying documents.

Quote: “[0051] In some implementations, the similarity metric can be utilized for additional downstream tasks (e.g., machine learning tasks, etc.). As an example, the similarity metric can be utilized to cluster at least one of the first and second documents (e.g., k-means clustering, hierarchical clustering, etc.).”

In simpler terms, this patent describes a method to determine how similar two documents are in terms of their meaning or content. It uses a machine-learned model to process the documents and determine a similarity score, which can be adjusted and refined based on actual data.

Contextualizing knowledge panels

This patent  with the identifier US11720577B2 is related to search engines and especially semantic search. This patent was first filled in January 2022 by Google and published in August 2023. The patent is a republishing and was first published in 2016. It was published for US, South Corea and WIPO. This means that it is more likely to be used in practice. Expiration date is 2038.

The patent describes a system designed by Google that provides users with a “knowledge panel” when they search for entities, like singers, actors, writers, etc. This knowledge panel provides relevant and detailed information about the searched entity based on the context of the search query.

Key Insights:

Entity and Context:

  • The system can receive search requests that include the name (or identifier) of an entity and additional context terms from the user’s search query.

“A system can receive requests including identifiers of entities… as well as context terms that are referenced by a search query submitted by a user.”

Knowledge Elements:

  • The system identifies multiple “knowledge elements” related to the searched entity. Knowledge elements can be facts or pieces of content related to the entity.

“Identifying a plurality of knowledge elements that are related to the entity;”

Ranking and Selection:

  • The system will rank these knowledge elements based on their relevance to the context terms from the search query.

 Each predicted set may have an associated confidence score or probability score indicating a level of certainty that the features provided predict the particular fixed set of entities. In some implementations, the confidence score may be based on a similarity measure which may differ depending on the type of set. For example, the confidence score for a location-based sets may be based on physical distance from a specified location, e.g., the current location of a computing device. The confidence score for a topic-based set may be based on an embedding distance with a query, such as an embedding generated based on signals from a client device. Such signals can include text recently seen on the screen, the state or proximity of external devices, content of recent searches, stated user interests, an application installed or executing on the client device, a time stamp, etc.,

  • It then selects the most relevant knowledge elements to display.

“Assigning, by one or more computers, rank scores to the plurality of knowledge elements… selecting one or more of the knowledge elements from among the knowledge elements based at least on the rank scores assigned to the knowledge elements;”

Presentation in the Knowledge Panel:

  • The system can decide how to present the selected knowledge elements in the knowledge panel. This includes determining the position of the knowledge panel on the search results page, the number of knowledge elements to show, their position within the panel, and even specific features like highlighting text or deciding on a title or subtitle.

“Providing information associated with the entity and the one or more selected knowledge elements comprises providing data that causes information associated with the entity and the one or more selected knowledge elements to be presented in a knowledge panel, the knowledge panel being presented with a search results page associated with the search query.” “In some implementations providing data that causes information associated with the entity and the one or more selected knowledge elements to be presented in the knowledge panel comprises determining, based on identifying the one or more context items that are associated with the entity that is referenced by the search query, a number of knowledge elements to select for presentation in the knowledge panel;”

Customization:

  • The system can customize the knowledge panel based on the context of the search. This could include choosing which information to highlight, or determining a suitable title or subtitle for the displayed information.

“In certain aspects, providing data that causes information associated with the entity and the one or more selected knowledge elements to be presented in the knowledge panel comprises determining, based on identifying the one or more context items that are associated with the entity that is referenced by the search query, a title or subtitle relating to one or more of the selected knowledge elements presented in the knowledge panel…”

In essence, this patent reveals how Google aims to enhance its search results by presenting users with a tailored “knowledge panel” that provides in-depth information about searched entities, adjusted and ranked based on the context of the user’s search query.

Content selection and presentation of electronic content

This patent  with the identifier US11663277B2 is related to search engines and especially News. This patent was first filled in May 2021 by Google and published in August 2023. It was published for US, Canada and WIPO. This means that it is more likely to be used in practice. Expiration date is 2038.

The patent deals with a method and system for populating an interest feed with electronic news article resources. It is obviously related to Google Discover.

Method for Populating an Interest Feed:

  • The system looks for electronic news articles that reference two entities (first entity and second entity) with a significant relevance.
  • If enough articles reference both entities, it’s inferred that there is an ongoing event involving both entities.
  • A representation (like a news summary or highlight) of this event is generated using content from the articles and other resources.
  • If a user’s interest list includes the first entity but doesn’t already have the event, this representation is provided to the user’s device.

Scoring and Selection:

  • Each article is assigned scores based on how much they relate to the first and second entities.
  • The system can create a “superset” of articles that mention both entities. From this, a subset is selected where each article references both entities with significant relevance.

Event Determination:

  • Multiple possible events involving the two entities are identified.
  • A filtering algorithm is used to shortlist the most likely events based on the articles.
  • The type of the likely event can be determined using the articles.

Content Generation:

  • The event representation includes content that describes the ongoing event.
  • Additional content is generated based on other resources which relate to some attribute of the activity in the event.

Attributes and Additional Resources:

  • Attributes might include the type of activity, the industry of the activity, a connection to a previous event, or correlation with another user’s interest list.
  • Based on the correlation between the first and second entities, the system selects additional resources to provide more context or information.

User Search Queries:

  • The system can also identify the first entity based on search queries from multiple users.

Key Insights with Quotes:

  1. Interest Feed Generation:
    • The patent emphasizes creating a personalized feed for users based on trending events related to their interests.

      “A method for populating an interest feed with electronic news article resources…”

  2. Relevance Threshold:
    • The system relies on a threshold of relevance to determine which articles are significant.

      “…that a threshold quantity of electronic news article resources each reference both a first entity and a second entity with at least a threshold magnitude of relevance…”

  3. Event Representation:
    • The system generates a summarized or highlighted representation of the detected event for the user.

      “generating a representation that corresponds to the event…”

  4. Additional Content:
    • Content isn’t just derived from news articles. The system will seek other resources that relate to the event’s activities.

      “…generating second content that is based on an additional resource…”

  5. User’s Interest List:
    • Users receive content based on their specific interests, enhancing the personalization of their feed.

      “identifying a user account that includes an interest list that includes the first entity but that does not include the determined event…”

In essence, the patent outlines a sophisticated method for curating a personalized news feed for users, driven by electronic news articles, relevance scoring, event detection, and additional resources. It sounds for me like a description of a system like Google Discover.

Systems and methods that match search queries to television subtitles

This patent  with the identifier US11743522B2 is related to search engines. This patent was first filled in May 2021 by Google and published in August 2023. It was published only for US. It belongs to a patent family first published in February 2017.

The patent describes a system and method for providing video program information to users.

Core Concept:

The primary goal of this system/method is to identify a spike in search queries during a specific time period and then correlate this spike to a media content item (like a video or TV show) that was presented during that time. If a user shows interest in this media content during a subsequent time period, the system will then provide search results corresponding to the original search queries to the user’s computing device.

Quote:

“A method for providing video program information, the method comprising: identifying, using a hardware processor, a search query spike from search queries during a first time period; correlating, using the hardware processor, the search query spike to a media content item being presented during the first time period…”

Use of Subtitles/Terms:

To establish the correlation between the search spike and the media content, the system matches search terms from the queries to subtitle terms associated with the media content item.

Quote: “…by matching a plurality of search terms from the search queries to a plurality of subtitle terms associated with the media content item…”

Search Query Equivalence:

The system can determine if two search queries are equivalent in two ways:

  • If the sequence of search terms in one query is substantially identical to that in another.
  • If both queries express the same linguistic concept.

Quotes:

“…a first search query and a second search query from the search queries are identified as being equivalent when an ordered sequence of search terms from the first search query is substantially identical to an ordered sequence of search terms from the second search query…”

“…a first search query and a second search query from the search queries are identified as being equivalent when a linguistic concept expressed using search terms from the first search query is substantially the same linguistic concept expressed using search terms from the second search query…”

User Interest Indicators:

The method recognizes user interest in a media content item through different methods:

  • Detecting that the media content is currently displayed on a device near the user’s computing device.
  • Receiving an audio stream from the user’s device and correlating it to the media content.Quotes:

“…receiving the indication of user interest in the media content item from the computing device comprises receiving an indication that the media content item is currently being presented on a display device proximal to the computing device…”

“…receiving the indication of user interest in the media content item from the computing device comprises receiving an audio stream from the computing device and correlating the audio stream to the media content item…”

Second Screen Experience:

The system emphasizes the experience where the computing device (on which the user shows interest or receives information) is separate from the primary display device presenting the media content, commonly referred to as a ‘second screen’ experience.

Quote: “…the computing device is a second screen device and the media content item is presented on a display device proximal to the computing device…”

System Implementation:

The patent presents this method in various forms:

  • As a method (the process itself).
  • As a system (including a hardware processor in a server device).
  • As a non-transitory computer-readable medium containing instructions to execute the method.

Quotes:

“A system for providing video program information, the system comprising: a hardware processor of a server device…”

“A non-transitory computer-readable medium containing computer executable instructions that, when executed by a processor, cause the processor to perform a method for providing video program information…”

The patent proposes a unique way to enhance the user experience by detecting interest in specific media content based on search query spikes and providing relevant information on a secondary device. This caters to the growing trend of multi-device usage while consuming content and the curiosity-driven nature of audiences.

Television related searching

This patent  with the identifier US11758237B2 is related to search engines. This patent was first filled in August 2022 by Google and  published in September 2023. It was published  for US and WIPO. It belongs to a patent family first published in November 2011.

The patent pertains to a method where, when a user searches for something related to the content currently being shown on their media device (e.g., a TV show or movie), the system displays an overlay on the device. This overlay provides search suggestions and search results related to the media content. The results may include links to relevant channels or applications. Some of the claims expand on the method, detailing features like extracting keywords from the media content’s metadata, recognizing when the media content changes, and determining if a suggested application is already installed on the device.

Insights:

Overlay on Media Content: The core of the patent revolves around presenting search suggestions and search results as an overlay on the media content.

“responsive to the search request, causing a first portion of search suggestions and a second portion of search results to be presented on the display device in an overlay that is positioned over the media content item”

Diverse Content Types in Search Results: The search results can include different types of content like channels, applications, or web pages.

“the second portion of search results includes (i) a first search result that is associated with a channel content type… (ii) a second search result that is associated with an application content type”

“search results include a third search result that is associated with a web page content type”

Metadata Utilization: The system can identify metadata related to the media content and extract keywords to generate relevant search suggestions.

“identifying metadata related to the media content item being presented on the display device”

“extracting keywords from the identified metadata, wherein the search suggestions are generated based on the extracted keywords”

Responsive to Media Changes: The system can recognize changes in the media content and adjust the search suggestions accordingly.

“additional search suggestions are generated in the plurality of search suggestions in response to determining that a television programming change has occurred”

Application Launching and Installation: If a search result points to an application, the system can determine if the application is installed, launch it if it is, or prompt the user to install it if it’s not.

“in response to receiving a selection of the second identifier, determining whether the application has been installed”

“in response to determining that the application has not been installed, causing a prompt to install the application to be presented”

These insights provide a comprehensive overview of the patented technology. It appears to aim at enhancing the TV viewing experience by integrating responsive and relevant search capabilities directly into the media viewing interface.

Generating and/or utilizing a machine learning model in response to a search request

This patent  with the identifier US2023273923A1 is related to search engines and generative AI. This patent is a continuation of former patents and was first filled in  June 2023 by Google and  published in August 2023. It was published only for US.

The patent discusses methods, systems, and apparatuses for utilizing machine learning models in response to user search requests. When users submit search queries that may not have definite answers, the patent proposes using trained machine learning models to predict answers. These models can be trained “on the fly” based on the search query and can be associated with content items in a search index. The patent also describes an interactive interface that allows users to interact with the trained machine learning model to obtain predicted answers.

This patent is very interesting, because for me it describes the technological implementation of the Snapshot AI answers and the answers in conversational mode within the framework of SGE.

Important Insights:

Machine Learning for Search Queries:

This specification is directed generally to methods, systems, and apparatus for generating and/or utilizing a machine learning model in response to a search request from a user.

Quote: “Implementations described herein relate to providing, in response to a query, machine learning model output that is based on output from a trained machine learning model.”

Predictive Nature of Search Queries:

For example, a user can submit a request that is predictive in nature and that has not been predicted and/or estimated in existing sources.

Quote: “However, it is often the case that none of the answers and/or documents can include a “good” answer to the user’s query.”

Interactive Interface for Machine Learning Models:

The machine learning model output can additionally or alternatively include an interactive interface for the trained machine learning model.

Quote: “For example, the interactive interface can be a graphical interface that a user can interact with to set one or more parameters on which the machine learning model is to be utilized to generate a prediction…”

Generating Machine Learning Models Based on Search Queries:

Some implementations described herein relate to generating a trained machine learning model “on the fly” based on a search query.

Quote: “For example, some of those implementations relate to determining training instances based on a received search query, training a machine learning model based on the determined training instances…”

Working Example of Predictive Queries:

As a working example of some implementations, assume a user interacts with a client device to submit a query of “How many doctors will there be in China in 2050?” to a search engine.

Quote: “However, none of these search results may provide a satisfactory answer to the user’s query.

Training Instances for Machine Learning Models:

Training instances for a machine learning model can then be generated based on the variation parameters and their corresponding values, and the machine learning model trained utilizing the training instances.

Quote: “For example, a first training instance can include training instance input indicative of the year “2010” and training instance output indicative of the quantity of doctors in China in the year “2010”…”

Interactive Interface for Predictions:

After the machine learning model is trained, machine learning model output that is based on the trained machine learning model can be provided in response to the search query.

Quote: “For instance, and continuing with the working example, the interactive interface can include an interactive field that accepts various years as input…”

Nature of Predictions:

The prediction provided as machine learning model output for presentation to the user is based on one or more predicted values generated over the machine learning model.

Quote: “In some implementations, the prediction provided for presentation is a single value, such as a single predicted quantity of doctors in China in the working example.”

This patent essentially emphasizes the potential of machine learning models to predict answers to search queries, especially when traditional search methods might not yield satisfactory results. The integration of interactive interfaces further enhances the user experience by allowing them to interact directly with the model and adjust parameters for more tailored results.

The patent discusses advanced methods and systems for utilizing machine learning models in response to search queries. Here’s a summarized breakdown:

  1. Machine Learning Model in Search Queries:
    • The patent focuses on generating machine learning models to predict answers to search queries, especially when definite answers aren’t available.
    • Instead of just providing search results, the system can predict answers or provide an interactive interface for users to get predictions based on trained machine learning models.
  2. On-the-Fly Model Training:
    • The system can create a machine learning model “on the fly” based on a search query.
    • It determines training instances from the received search query, trains a model, and then provides an output based on this newly trained model.
  3. Model Validation:
    • After training, the model is validated to ensure its predictions meet a certain quality threshold.
    • If the model doesn’t meet the threshold, its output might be suppressed, and traditional search results might be shown instead.
    • Validation can use “hold out” training instances that weren’t used during the training phase.
  4. Interactive Interfaces:
    • Users can interact with an interface to input parameters and get predictions.
    • For instance, inputting different weather conditions to predict snowcone sales.
  5. Delayed Responses:
    • Due to the “on-the-fly” training, there might be a delay between the query submission and receiving the machine learning model’s output.
    • Users might receive a prompt or notification about this delay, and the results might be “pushed” to their device later.
  6. Model Indexing and Reuse:
    • Once trained, machine learning models can be indexed by content from their training data or other related content.
    • Later, if a similar query is received, the system can identify and use the previously trained model, saving computational resources.
  7. Technical Advantages:
    • The system reduces the need for users to submit multiple varied queries.
    • It offers computational efficiency by reusing trained models.
    • The system can train models without human intervention and index them for future use.

In essence, the patent introduces a dynamic way to use machine learning models in search engines, offering predictive answers and interactive interfaces, especially when definitive answers aren’t readily available.

This reminds me of what Google SGE is trying to do and I think this patent is closely related to SGE.

Structured entity information page

This patent  with the identifier US11706318B2 is related to search engines and specifically to semantic and entity based search. This patent is a continuation of former patents and was first assigned 2015 by Google and newly published in July 2023. It was published for US, Germany and WIPO. This means that it is more likely to be used in practice. Expiration date is 2036.

The patent describes a method performed by a server system to generate and display a structured information page associated with an entity. When the server system receives a request from a client device for this information page, it identifies historical user activity related to the entity. The server system then generates the information page by formatting it according to predefined and dynamically selected information types. The relative importance of these candidate information types is determined by the server system. The information page is populated with the identified information and transmitted to the client device for display. Additionally, the structured information page may include primary and secondary colors associated with the entity.

Important Insights:

  • Structured Information Page Generation: The patent emphasizes a method where the server system generates a structured information page related to a specific entity.

Quote: “The patent describes a method performed by a server system to generate and display a structured information page associated with an entity.”

  • Historical User Activity: The system identifies historical user activity related to the entity to tailor the information page.

Quote: “The server system receives a request from a client device for the information page and identifies historical user activity related to the entity.”

  • Dynamic Formatting: The information page is formatted based on both predefined and dynamically selected information types.

Quote: “The server system then generates the information page by formatting it according to predefined and dynamically selected information types.”

  • Relative Importance of Information: The server system determines the relative importance of candidate information types to populate the information page.

Quote: “The relative importance of the candidate information types is determined by the server system.”

  • Color Association: The structured information page may have colors associated with the entity, potentially for branding or recognition purposes.

Quote: “The structured information page may also include primary and secondary colors associated with the entity.”

It is obvious this patent describes the methodology of serving Knowledge Panels at Google.

Both the Knowledge Panel and the patent emphasize presenting structured information about entities. While the Knowledge Panel provides summaries and key details about a topic, the patent’s method also focuses on creating a structured page with relevant information about an entity. The patent mentions the use of historical user activity to tailor the information page. Similarly, Google’s Knowledge Panel might prioritize certain information based on user behavior and search trends. (more to this topic in my article on SEL How Google creates knowledge panels).

Both systems are centered around entities. Whether it’s a person, place, organization, or thing, the goal is to provide users with a comprehensive and organized view of the topic.

The patent mentions the inclusion of primary and secondary colors associated with the entity. This is reminiscent of how Google’s Knowledge Panel sometimes includes branding or recognizable images/colors associated with the entity being searched.

In both cases, a server system processes the request and sends the structured information to the client device (usually a user’s browser) for display.

The patent US10110701B2 can be seen as a technical embodiment of some of the concepts behind Google’s Knowledge Panel. While the patent provides a method for generating structured information pages based on user activity and predefined information types, Google’s Knowledge Panel serves as a real-world application that offers users structured, relevant, and concise information about their search queries. The patent might be one of the many technological backbones that support features like the Knowledge Panel, ensuring that users receive the most relevant and structured information for their queries.

Systems and methods for using document activity logs to train Machine-Learned models for determining document relevance

This patent  with the identifier US20230267277A1 is related to search engines and specifically to relevance ranking. This patent was first assigned by Google in April 2023 and published in August 2023. It was published for US and WIPO. This means that it is more likely to be used in practice.

The patent describes a computer-implemented method and system for training a machine-learned semantic matching model. This model is designed to determine the semantic similarity between two documents. The method involves:

  1. Obtaining two documents along with their respective activity logs.
  2. Using these activity logs to determine if the two documents are related.
  3. Inputting these documents into the semantic matching model to receive a semantic similarity value.
  4. Evaluating a loss function based on the difference between the determined relation (from the activity logs) and the semantic similarity value.
  5. Modifying the parameters of the semantic matching model based on this loss function.

The model can determine the semantic similarity by generating content embeddings for each document and then comparing these embeddings. The content of the documents can include text, images, videos, and files. The activity logs describe various access events related to the documents, such as opening, sharing, editing, and more. The relation between documents can also be influenced by the time and type of access events. The trained model can be used to rank search results based on their semantic similarity to a user’s search query.

Important Insights:

Semantic Matching Model Training:

The patent focuses on training a semantic matching model to determine the relatedness of two documents.

“A computer-implemented Method for training a machine-learned semantic matching model…”

Use of Activity Logs:

The method uses activity logs associated with documents to determine if they are related.

“…a first document activity log associated with the first document, a second document, and a second document activity log associated with the second document…”

Semantic Similarity Evaluation:

The model provides a semantic similarity value, which is an estimation of how similar the two documents are in terms of meaning.

“…a semantic similarity value representing an estimated semantic similarity between the first document and the second document…”

Content Embeddings:

The model determines the semantic similarity by generating content embeddings for each document. These embeddings are essentially numerical representations of the content. By comparing the embeddings of two documents, the model can estimate their semantic similarity.

“…determining a first content embedding for the first document based on at least a portion of content of the first document; determining a second content embedding for the second document based on at least a portion of content of the second document; and generating, based on the first content embedding and the second content embedding, the semantic similarity value representing the estimated semantic similarity between the first document and the second document.”

Diverse Content Types:

The documents can contain different types of content, including text, images, videos, and files. The model is capable of generating embeddings for these diverse content types, which means it can determine the similarity between documents containing any of these types of content.

“…content of the first document and the content of the second document respectively comprise at least one of: first image data and second image data; first video data and second video data; first textual data and second textual data; or first file data and second file data…”

Textual Embedding Process:

For textual content specifically, the patent describes a method to determine embeddings. This involves selecting character subsets from the text based on their appearance frequency. These subsets are then averaged to determine the textual embedding. By comparing the textual embeddings of two documents, the model can estimate the semantic similarity between them.

“…selecting one or more character subsets from the textual data of the corresponding document based at least in part on an appearance frequency for each character subset of a plurality of character subsets of the textual data of the document; and averaging the one or more character subsets to determine the textual embedding.”

Activity Log Details:

  • The activity logs capture various events related to the documents, such as sharing, opening, renaming, and more.

“…the access type comprises: a document sharing event; a document opening event; a document renaming event…”

Search Result Ranking:

  • The trained model can be used to rank search results based on their semantic similarity to a given search query.

“…ranking, by the one or more computing devices, based on the second semantic similarity value, the search result document among a plurality of ranked search result documents corresponding to the search query…”

System Implementation:

  • The patent also describes a computing system that can determine semantic similarity between documents using the trained model.

“A computing system for determining semantic similarity between documents…”

Textual Embedding Process:

  • For textual content, embeddings are determined by selecting character subsets based on their appearance frequency and then averaging them.

“…selecting one or more character subsets from the textual data of the corresponding document based at least in part on an appearance frequency for each character subset…”

Query composition system

The patent “Query completions” with the identifier US20230244657A1 is related to search engines and specifically to search query processing. This patent was published by Google in August 2023. It was published for US, Russia, China, Spain and WIPO. This means that it is more likely to be used in practice.

The patent specification pertains to the selection and provision of query suggestions to a user device. Instead of waiting for a user to type in a search query, the system can present context clusters based on the user’s context (like location, date, time, preferences) when they initiate a search. These context clusters group related queries (e.g., related to movies) and can be shown to the user even before they type anything. The user can then select a context cluster, and a list of related queries from that cluster will be presented as options. The system determines the probability of a user selecting a particular context cluster based on various factors, ensuring that the most relevant queries are presented to the user.

Important Statements:

  1. Context Clusters: “This specification relates to selecting and providing query suggestions to a user device… one or more context clusters (e.g., “movies”) may be presented to the user for selection input prior to the user entering any query input.”
  2. Creation of Context Clusters: “In general, one innovative aspect… storing, in a data storage system accessible by the data processing apparatus, data describing the context clusters and the context cluster probabilities.”
  3. Receiving User Context: “Another innovative aspect of the subject matter… display a context cluster selection input that indicates the selected context cluster for user selection.” [Citation: [0005]]
  4. Advantages: “Implementations of the subject matter described below allows for the user to provide a search query without being required to input any characters… selectable by the user without a query input by the user.”
  5. Details of Context Clusters: “Each context cluster includes a group of one or more queries… users’ informational needs are more likely to be satisfied.”

This patent essentially introduces a method to streamline the search process by presenting users with relevant query suggestions based on their context, even before they start typing, enhancing the user experience and making the search process more efficient.

Query completions

The patent “Query completions” with the identifier US11693863B1 is related to search engines and specifically to the generation of query completions. This patent was first drawn by Google in 2020 and published in July 2023. It was only filed in US.

The patent describes a system that uses a general-purpose action prediction engine to rank query completions based on how likely the query completions are to co-occur, in records of user activity of many users, with a query previously entered by the user. A reference query can be used to search the records of user activity to identify likely query completions.

“Ranking query completions based on queries that are highly likely to co-occur with a previous query can provide users with more relevant and more personalized query completions. Users may also see useful queries that they would not have otherwise seen.”

The system receives a query prefix from a user, obtains a reference parameter for the user, identifies one or more likely queries that are likely to co-occur with the reference parameter in user activity sessions, determines a ranking of the one or more likely queries according to the prediction scores, and provides the ranking of the one or more likely queries in response to receiving the query prefix.

In general, one innovative aspect of the subject matter described in this specification can be embodied in methods that include the actions of receiving a query prefix from a user; obtaining a reference parameter for the user; identifying one or more likely queries that are likely to co-occur with the reference parameter in user activity sessions, wherein each likely query has an associated prediction score; determining a ranking of the one or more likely queries according to the prediction scores; and providing the ranking of the one or more likely queries in response to receiving the query prefix.

“The ranking factor R is given by R = P(x|q) / P(x), wherein P(x|q) is a measure of a likelihood of the likely query x occurring in an activity session given that the reference parameter q also occurred in a same activity session, and P(x) is a measure of the likelihood of the likely query x appearing in an activity session.”

The patent also describes a system architecture that includes a search system front-end, a search engine, a query completion engine, a verification engine, and a prediction engine. The prediction engine identifies likely queries by analyzing a large collection of activity sessions in a session database. The query completion engine then ranks the query completions that will be provided to the user device by combining scores computed by the prediction engine and by the initial scores received from the verification engine.

The query completion engine then ranks the query completions that will be provided to the user device by combining scores computed by the prediction engine and by the initial scores received from the verification engine. For example, the query completion engine can promote likely queries that have high initial scores. Thus, the query completions provided to the user device are more likely to include query completions that are likely to co-occur with the user’s previous query in user activity sessions.

“The query completion engine 260 can provide the previous query 217 to a prediction engine 280 in a request to obtain likely queries 218. The prediction engine 280 may also receive the previous query 217 from another module in the search system 230, e.g., from the search system front-end 240, from the search engine 250, or from the query database 262.”

This system aims to provide users with more relevant and more personalized query completions, and users may also see useful queries that they would not have otherwise seen.

The release of this patent shows how Google could  create Autosuggests for queries in general and personalized to a user based on historical user data and prediction.

Surfacing unique facts for entities

This patent was first drawn by Google in 2016 and renewed in January 2023. Since the patent has been filed in the USA, Europe, China and worldwide, it is likely that Google will use it in practice.

The patent describes systems and methods for identifying and providing interesting facts about an entity. The inventors are Akash Nanavati, Aniket Ray and Torsten Rohlfing, and the applicant is Google Inc . The patent was published on January 31, 2023 .

The patent is about the extraction of facts from unstructured data. (more about this also in the articles How Google can identify and interpret entities from unstructured content and Natural Language Processing to build a semantic database.) and serving facts about entities in the SERPs.(more about in the article How does Google understands search terms by search query processing? )

The patent describes systems and methods for identifying and providing interesting facts about an entity. An example method includes selecting documents associated with at least one unique fact trigger from a document repository.

The method also includes generating entity-sentence pairs from the documents and, for a main entity of the entities represented by the entity-sentence pairs, clustering the entity-sentence pairs for the main entity using salient terms that occur in the sentence.

This means that the entity-sentence pairs for a given entity are clustered based on the salient terms that occur in the sentences. The goal of clustering is to group similar entity-sentence pairs together to identify the most relevant and interesting facts about the entity.

“In some implementations, the unique fact finder 115 may filter out sentences that are likely already represented in the knowledge base 190 as structured facts. For example, sentences that match certain patterns, such as “X is friends with,” “X is married to,” or “X was born on” where X represents the entity mention, may be removed from the entity-sentence pairs because these sentences do not likely represent unique facts. Rather, such sentences represent structured facts. The patterns for identifying sentences that are likely structured facts may be hand curated and stored as part of the system 100.”

Which entity-sentence pairs are selected can be based on the topicality of the source document, pagerank of the document, length of the sentence, number of characters or a promotion factor of the source document that is based on links, among other things.

“Another factor may be a promotion factor that measures the fun-quotient of the source document. For example, the more inbound links for the source document that include whitelisted trigger phrases or synonyms of the whitelisted trigger phrases, the higher this promotion factor is. “

A low IDF score or rating by a rater can also be a factor.

“The IDF score represents how rare a term is across a corpus of documents, thus terms that occur less frequently across the corpus have a higher IDF score than very common terms. The IDF score for a sentence may be a demotion factor where low IDF scores represent a demotion. In some implementations, one or more of the entity-sentence pairs may be rated by an external rater (a human) for an interestingness factor.”

The topicality of a document in relation to the main entity describes the importance of the entity for the document.

“For example, if the source document for the sentence about cat urine is about cats, the entity cat will have a high document topicality score and urine will be lower. If, however, the document is about urine, the urine entity will have a higher document topicality score.”

In addition, a Semantic Importance Score can be taken into account, which measures the topicality of the entity for the sentence.

“For example, the source document may be about cats, but a sentence may compare a unique fact about a dog to cats, e.g., “a dog’s sense of smell is 100× more sensitive than a cat’s”. The semantic importance score of the The topicality score for an entity-sentence pair may be determined based on the document topicality score, the semantic importance score, or a combination of these.”

The frequency of facts mentioned in relation to an entity seems to be an indication of correctness.

“Clustering enables the system to avoid showing duplicate or near duplicate facts in a search result and enables the system to accumulate support across sentences expressing the same fact, which is an indication of a fact’s correctness and uniqueness.”

Similar terms can be assigned to a cluster via lemmatization, a sub-step of natural language processing.

The method also includes determining a representative set for each of the clusters and providing at least one of the representative sets in response to a query identifying the main entity.

Google could use these representative sentences for the knowledge panel or a kind of featured snippet.

Another example method includes determining that a query relates to an entity in a knowledge base, determining that the entity has an associated unique fact list, and providing at least one of the unique facts in the list in response to the search query.

The patent also speaks explicitly of a document repository, which is obviously a classic search index.

“For example, document repository 195 may include an index that stores terms or phrases that appear in the documents, as well as the content of the documents or a pointer to the content. In some implementations the document repository 195 represents documents available over the Internet .”

There are also some interesting statements in the patent about the general selection of sources. This describes how blogs or forums are used less as a source for facts, as they are more about opinions than facts.

Sources that stand out due to duplicate content or replication are also not considered as sources.

“Likewise documents classified as blogs or forums may be considered low-quality. Blogs and forum are likely to include more opinions than facts and less likely to have reliable facts. The system may also consider documents classified as syndicated or plagiarized as low quality. The content of documents classified as syndicated or plagiarized is duplicated from other documents.For example, a web site may be a collection of news stories from news organizations.The system may consider such documents as lacking original content and, therefore, low-quality.Another criteria used by the system to identify low-quality documents may be blacklisting.For example, a document or a domain may be added to a list and any documents in the list (e.g., specifically identified or matching a domain) are considered low-quality . Such a list may be manually curated. The system may ignore low-quality documents so that they are never considered as unique fact sources.”

Fact triggers can be unique unusual information. Trigger terms can be here:

  • did you know
  • fun facts
  • Interesting Facts

Link texts with these terms can also be beneficial. While these terms can promote whitelisting of the documents, there are
words like

  • Lie
  • myths

that may encourage blacklisting.

It is interesting that the patent obviously talks about a Knowledge Graph, which assigns text or other information to entities in addition to attributes. (I described this in more detail in the article ….).

“In some implementations, the knowledge base 190 may be a data graph, where entities are stored as nodes and facts are stored as relationships between entities or attribute-value pairs for the entities. The edges may be labeled edges and the labels may represent thousands or hundreds-of-thousands of different facts. As used herein, entity may refer to a physical embodiment of a person, place, or thing or a representation of the physical entity, e.g., text, or other information that refers to an entity.”

This patent give some interesting insights how Google could identify and serve facts about entities and show that a knowledge graph is still important for Google.

Most Interesting Google patents of the last years

Here further interesting Google Patents of the last years:

Distance based search ranking demotion

The Google patent “Distance based search ranking demotion” was drawn in 2018 and published in September 2022. There were various prior versions dating back to 2018 and 2020.  The oiginal patent is from 2015. The patent has a scheduled expiration date of 2035. There are signings for the patent in US, Spain, Germany and China. This makes it very likely that the patent will be used.

The patent is about ranking local documents in relation to local search queries. More precisely, it is about the downgrading of documents when it is far away from the location of the terminal on which the search is performed.

A local search result document is a “distant” search result document when the location associated with the local search result document is determined to not meet a proximity threshold. A proximity threshold may be met, for example, when the location for the local search result document and the location for the user device are within a same geographic region (e.g., a same state), or within a threshold distance (e.g., 100 miles).

Documents that are too far away from the user’s location or do not serve a local search intention or do not have a sufficient ranking score are downgraded.

Two ranking components are described in the patent. An information retrieval score and an authority score.

The search results are ranked based on scores related to the resources identified by the search results, such as information retrieval (“IR”) scores, and optionally a separate ranking of each resource relative to other resources (e.g., an authority score).

Possible ranking criteria for local documents are included addresses or frequent calls from users from the region in relation to users outside the region. The rating is determined by a ranking subsystem for local search results.

 For example, the local result subsystem 120 may determine a document is a local document if the document includes an address; or if search results for the document have a high rate of selection from user devices in a given location relative to user devices outside of the particular location; or if the local document has been specified by the publisher as being local to a particular location; etc.

Excitingly, the patent describes that for local search queries that include a geographic identifier such as city, zip code or similar, the distance to the user’s location is not as important.

For such queries, search result documents that are local to the location specified by the location phrase may be determined to be more relevant than search result documents that are not local to the location. In particular, the location of the user device may be determined to be of little, if any, relevance, as the user has explicitly specified a location.

As soon as the search query has an implicit local search intention, i.e. no geopgraphic identifier explicitly appears in the search query, but there is still a local search intention, the distance to the user plays an important role.

However, if the query does have an implicit local intent, and is not an explicitly local query, e.g., such as the query “coffee shops,” then the local result subsystem 120 performs a distance adjustment process 122.

An implicit local search intent can be determined via user behavior.

A downgrading of results can take place if the ranking score does not reach a certain threshold or the object described in the document is too far away from the user. Depending on the degree of locality of the search query, non-local documents can also rank, but can also be pushed down a few places by documents with better local relevance.

The process 200 adjusts the search score of the local document eligible for demotion to demote its ranking in the first order so that the rank of the demoted local document relative to the rank of the sufficiently ranked non-local document is decreased. In some implementations, the demotion can be such that the demoted local document is ranked at least one position below sufficiently ranked non-local document.

If the local object described in a document is too far away, it can also lead to a downgrade. The distance from the user plays a role here. If local objects are too far away from the user, they are not ranked. The maximum distance to the user differs depending on the object. When searching for a restaurant, the distance will be smaller than when searching for a hospital.

Search result ranking and presentation

The Google patent “Search result ranking and presentation” was drawn in 2019 and published in August 2022. There were various prior versions dating back to 2012. The patent has a scheduled expiration date of Aug. 16, 2032. The patent describes the basic features of a semantic or entity-based search.

In some implementations, a computer implemented method for providing search results comprises determining, using one or more processors, an entity reference from a search query. A ranked list of properties associated with a type of the entity reference is identified based on a knowledge graph. A property for generating a presentation of search results from the ranked list of properties is identified, based at least in part on the search query and on the type of the entity reference.“ 

„In some implementations, a computer implemented search method for search comprises identifying a modifying concept based on a search query. A rule for ranking search results is determined based at least in part on the modifying concept and on a knowledge graph from which at least one of the search results was obtained. Search results are ranked based at least in part on the rule.

The patent addresses the fact that search queries require different display formats in addition to a list of links as a response.

„In some implementations, it may be desired to present search results using a technique that reflects the content of the search query, the content of the search results, or both. For example, it may be useful for the search system to present search results that include geographic locations on a map, and to present search results that include chronological dates on a timeline. For example, a search results for the search query “Cities in California” may automatically be presented on a map, while search results for the search query “Paintings by Van Gogh” may be presented in an image gallery view.“

For search queries such as “the tallest building”, it is useful to output a listing of the entities with the largest values for the “height” property.

In an example, where search query block 102 includes the search query “Tallest Building,” the search system may retrieve a collection of buildings from data structure block 104 and/or webpages block 110, determine that the sorting property is “Height,” and may output a ranked list of buildings by height to ranked search results block 108.

In addition, it is pointed out that it is partly necessary to access data from a structured database in order to create the search results. This can be a knowledge graph, for example.

„Data structure block 104 includes a data structure including piece of information defined in part by the relationships between them. In some implementations, data structure block 104 includes any suitable data structure, data graph, database, index, list, linked list, table, any other suitable information, or any combination thereof. In an example, data structure block 104 includes a collection of data stored as nodes and edges in a graph structure. In some implementations, data structure block 104 includes a knowledge graph.“

The graphic from the patent has similarities to a graphic I created showing the interaction between the classic search index and the Knowledge Graph. In this graphic the interface between the two databases is called “Processing Block”. In my graphic I call it Entity Processing that could be built on the foundation of hummingbird.

The Processing Block is used to create entity references to the search query. This is done by Natural Language Processing.

„The search system determines an entity reference from the search query by parsing, by partitioning, by using natural language processing, by identifying parts of speech, by heuristic techniques, by identifying root words, by any other suitable technique, or any combination thereof. In some implementations, the entity reference includes text or other suitable content referencing any suitable topic, subject, person, place, thing, or any combination thereof.

The modifiers shown in the graph can be e.g. superlatives like best, oldest, highest ….

More about this in my article HOW DOES GOOGLE UNDERSTAND SEARCH TERMS BY SEARCH QUERY PROCESSING?

An entity reference is the concept to a real world thing. The Processing Block creates a list of ranked properties of the entity.
Additionally, the entity’s properties can be enriched with other formats such as links, images, and videos.

„In some implementations, ranked search results, presentation techniques, or both, are output to ranked search results block 108. In some implementations, search results include, for example, entities from data structure 104, other data from data structure 104, a link to a web page, a brief description of the target of the link, contextual information related to the search result, an image related to the search result, video related to the search result, any other suitable information, or any combination thereof.“

If a search query can refer to multiple entity references, a Popularity Score per entity is taken into account. The most popular entity is prioritized in the delivery of search results.

„In some implementations, the search system selects one of the more than one identified entity references based on a global popularity score of that entity reference, a relevance and/or closeness to some or all elements of a search query, user input, user history, user preferences, relationships between the entity references as described in a data structure, any other suitable information, or any combination thereof. 

Read more in my articles How Google creates knowledge panels (SEL) and  KNOWLEDGE PANELS & SERPS FOR AMBIGUOUS SEARCH QUERIES

It is exciting that the patent describes that not only entities, but also complete lists can be stored in the Knowledge Graph, which can then be delivered directly upon search query.

„In some implementations, the ranked list of properties is stored in a data structure such as a knowledge graph, in a database, in any other suitable data storage arrangement, or any combination thereof. In some implementations, a schema table is preprocessed. In some implementations, the ranked list is predetermined, is based on the received search, or any combination thereof.

The ranking of the lists can be based on the following:

  • Popularity
  • search history
  • User habits
  • Input from developers
  • Trends in general search behavior
  • Recent search patterns
  • Content
  • Domain related ranking

This ranking takes place in the Processing Block or Entity Processing in my words.

The relationships between entities can be established using a “phrase tree”. The phrase tree is a theoretical construct that represents the relationships between entities.

Dynamic Injection of related content in search results

This Google patent was published 07.05.2022 and filed on 06.08.2020. This Google patent is for me one of the most exciting in 2022. It is only registered in the US and China. It is therefore unlikely that it is currently in international use. But still exciting!

It describes a methodology how a search engine automatically suggests further links and search query alternatives within a box based on the dwell time in the SERPs. The appearance of these suggestions is reminiscent of the “others also searched for” suggestions when you return to the SERP after clicking on a search result.

It seems to be oriented to this functionality and to integrate more suggestions like links into the SERP. The difference to the already known functionality here is that not a click on a search result is the triggering event, but the dwell time.

“Implementations use a dwell signal to display related suggested items and/or to influence “next page” search results for dynamic pagination. For example, some implementations may calculate related suggestions for a search result presented in response to a query. The suggestions may include refined queries and/or links to specific items. “

If a threshold value for a dwell time is reached, a box with suggestions is automatically displayed, because it can be assumed that the user has not found what he is looking for.

The suggestions are intended to make the user direct the search queries in a slightly different direction and suggest similar content of the same category or class.

In addition, or instead, the suggestions may offer tangential suggestions that take the user in a slightly different direction, e.g., offering related queries, alternate interpretations of the query terms, and/or documents in a same category/classification as the particular search result but not highly similar to the result.

Besides links and search query refinement, the suggestions can also consist of images, videos, PDFs, audios … include. Entities can also be suggested.

In finding responsive items, the query system 120 may be responsible for searching one or more indices, represented collectively as item index 140. The item index 140 may include a web document index, e.g., an inverted index that associates terms, phrases, and/or n-grams with documents. Web documents can be any content accessible over the Internet, such as web pages, images, videos, PDF documents, word processing documents, audio recordings, etc. The item index 140 may also include an index of entities, for example from a knowledge base or knowledge graph

It is also interesting to note that suggestions can be generated based on the user journeys of other searchers.

In some implementations, the suggested follow-on queries may be related to a specific responsive item. For example, the responsive item may be associated with one or more queries, e.g., because the responsive item has been selected often after being presented as a search result for the related queries. If the responsive item has related queries these queries may be included as suggested items for the responsive item. For example, the suggested items 135 can include parts of a topic journey that other users have taken. For instance, if the current query is “jobs in Pittsburgh” the search system may suggest “housing in Pittsburgh” or “best elementary schools in Pittsburgh” as a suggested item 135.

Refinement suggestions are issued for ambiguous search queries based on other interpretations of the search query. Or in the form of terms with similar meanings, or in the form of explicit questions that illuminate a new perspective.

As another example, the suggested items 135 may include alternate interpretations of a query term. For instance, the query “jaguar” may result in “jaguar car,” “jaguar cat,” and/or “jaguar team” as suggestions. Similarly, suggested items 135 may include alternate possibilities. For example, a query of “washing machine” may have as suggested items 135 “new washing machine” or “washing machine repair” while a query of “university” may include “trade school” or “journey program” as a suggested item 135. Another example of suggestions tangential to a query are alternate viewpoints. For instance, a query of “How long should I foam roll after running?” may have as a suggested item “Should I foam roll after running?” or “Alternatives to foam rolling after running.”

In addition to the suggestions in a box, a “Next page” function can be used to offer the user to refresh the search results completely, or at least the first ten, without having to load a completely new set of hundreds of results.

The next page may include another small set of results, which may include some of the original smaller set that were not included in the first page as well as results added due to the dwell score signals. Thus, implementations may support dynamic pagination of search results and use a dwell score (or scores) to determine which search results are provided next. Dynamic pagination may be utilized irrespective of manual pagination; in other words, the user may interact with a “next page” type UI element or via automatic in-line pagination, which appends new results to the existing page.

The advantage would be a faster display of search results.

Accelerated large scale similarity calculation

This Google patent was first published in 2019 and republished on 2022-05-07.

The patent describes the process for determining a similarity of two entities based on the similarity of attributes. The degree of similarity is determined by a similarity score. The purpose of this process is to determine a response to a query.

“For example, the query might seek information indicating which domain names 20-year-old males in the U.K. find more interesting relative to the general population in the U.K. The system computes the correlations by executing a specific type of correlation algorithm (e.g., a jaccard similarity algorithm) to calculate correlation scores that characterize relationships between entities of the different datasets.”

Googles machine Learning platform Tensorflow is used as the basis for determining the similarity score.

“The system includes a tensor data flow interface that is configured to pre-load at least two data arrays (e.g., tensors) for storage at a memory device of the GPU.”

For example, a Knowledge Graph and/or the Knowledge Vault or any kind of semantic database can be used as the entity database accessed by the algorithm.

The following example from the patent shows possible entities and attributes for a comparison:

“For example, entities of one dataset can be persons or users of a particular demographic (e.g., males in their 20’s) that reside in a certain geographic region (e.g., the United Kingdom (U.K.)). Similarly, entities of another dataset can be users of another demographic (e.g., the general population) that also reside in the same geographic region.”

The exciting thing about the patent is that in addition to outputting search results, it can also be used to create groups or cohorts of similar users for Google Analytics, for example. You can find these sections in the patent:

“For situations in which the systems discussed here collect and/or use personal information about users, the users may be provided with an opportunity to enable/disable or control programs or features that may collect and/or use personal information (e.g., information about a user’s social network, social actions or activities, a user’s preferences or a user’s current location). In addition, certain data may be treated in one or more ways before it is stored or used, so that personally identifiable information associated with the user is removed. For example, a user’s identity may be anonymized so that the no personally identifiable information can be determined for the user, or a user’s geographic location may be generalized where location information is obtained (such as to a city, ZIP code, or state level), so that a particular location of a user cannot be determined.

System 100 is configured to analyze and process different sources of data accessible via storage device 106. For example, CPU 116 can analyze and process data sources that include impression logs that are interacted with by certain users or data sources such as search data from a search engine accessed by different users. In some implementations, entities formed by groups of users, or by groups of user identifiers (IDs) for respective users, can be divided into different groups based on age, gender, interests, location, or other characteristics of each user in the group.”

“In some implementations, hosting service 110 represents an information library that receives and processes queries to return results that indicate relationships such as similarities between entities or conditional probabilities involving different sets of entities. For example, a query (or command) can be “what are all the conditional probabilities associated with some earlier query?” Another query may be related to the conditional probabilities of all the ages of people that visit a particular website or URL. Similarly, another query can be “what are the overlapping URL’s visited by 30-year-old females living in the U.S. relative to 40-year-old males living in the U.K.?”

This Google patent shows that similarities and thus relationships between entities are important to Google and that attributes are the basis for determining these. It also shows that organizing around entities in terms of Internet users can also be a solution to the privacy challenges of building cohorts of similar users based on certain attributes.

Methods, systems and media for providing a media search engine

This Google patent was first published in 2011 and republished on 2022-08-02 under a new patent number. The status is active and the anticipated expiration is january 2031. It is classified in Operations research or analysis, machine learning and Marketing, e.g. market research and analysis, surveying, promotions, advertising, buyer profiling, customer management or rewards; Price estimation or determination.

This patent describes how an algorithm is trained via supervised machine learning using various methods (logistic regression, support vector machines, Bayesian approaches, decision trees, etc.) in order to classify the content. Content is labeled by people in order to then make it available to a learning algorithm as sample training data. It should be noted that these learning approaches are useful in situations when the classes considered are significantly biased (pornography or adult content, children’s content, hate speech, bombs, weapons especially, ammunition, alcohol, offensive language, tobacco, spyware, unwanted code, illegal drugs, downloading music, certain types of entertainment, illegality, profanity, etc.) and where there are limited resources to get information from people.

In addition, the approaches can be used to prevent the display of advertising on critical pages or content. Classification can be based on URL, text, anchor texts, DMOZ categories, third party classification, images on a page…

This patent provides approaches that Google uses to evaluate E-A-T, or classify, websites for spam, scam, or other content that Google does not want indexed.

About Olaf Kopp

Olaf Kopp is Co-Founder, Chief Business Development Officer (CBDO) and Head of SEO at Aufgesang GmbH. He is an internationally recognized industry expert in semantic SEO, E-E-A-T, modern search engine technology, content marketing and customer journey management.As an author, Olaf Kopp writes for national and international magazines such as Search Engine Land, t3n, Website Boosting, Hubspot, Sistrix, Oncrawl, Searchmetrics, Upload … . In 2022 he was Top contributor for Search Engine Land. His blog is one of the most famous online marketing blogs in Germany. In addition, Olaf Kopp is a speaker for SEO and content marketing SMX, CMCx, OMT, OMX, Campixx...

COMMENT ARTICLE



Content from the blog

Most interesting Google Patents for SEO in 2023

In this article I would like to contribute to archiving well-founded knowledge from Google patents.Research read more

E-E-A-T: More than an introduction to Experience ,Expertise, Authority, Trust

There are many definitions and explanations of E-E-A-T, but few are truly tangible. This article read more

Google’s journey to a semantic search engine

In this post I would like to discuss what steps and innovations have brought Google read more

What is Digital Authority Management? Role and tasks of a Digital Authority Manager

Digital Authority Management can close the gap between brand and SEO. This is an overview read more

The most interesting Google patents and scientific papers on E-E-A-T

E-E-A-T has become one of the most important ranking influences for Google search results since read more

The role of content types and formats in the customer journey

Content is one of the most important marketing instruments for accompanying users along their customer read more