Author: Olaf Kopp
Reading time: 3 Minutes

Query generation using structural similarity between documents

Topics: , ,

Rate this post

The method generates new  search queries (synthetic queries) by analyzing well-performing seed queries and structured documents. It creates patterns from these documents, applies them to other similar documents to produce new queries, evaluates their effectiveness, and stores the best-performing ones to improve future search results.

  • Patent ID: US9436747B1
  • Assignee: Google LLC
  • Countries: United States
  • Last Publishing Date: September 6, 2016
  • Inventors: Paul Haahr, Steven D. Baker, Michael E. Flaster, Nitin Gupta, Srinivasan Venkatachary, Yonghui Wu
  • Expiration Date: November 9, 2030


The patent, titled “Query generation using structural similarity between documents,” describes methods for generating synthetic queries based on the structural similarity between documents. It involves identifying coding fragments within structured documents, creating query templates based on these fragments, and generating candidate synthetic queries by applying these templates to other documents on the same website. Synthetic queries are then designated based on their performance compared to a predefined threshold. This technology aims to improve search query generation and enhance search engine performance.


  • The method involves identifying coding fragments within structured documents.
  • Generating query templates based on these coding fragments.
  • Applying these templates to other documents on the same website.
  • Identifying terms that match the structure of the query templates as candidate synthetic queries.
  • Measuring the performance of each candidate synthetic query.
  • Designating as synthetic queries those candidates that have performance measurements exceeding a threshold.

There are real-world examples mentioned for the application in practice.

  • Seed queries, which are queries known to perform well, are associated with documents in a structured document corpus.
  • The synthetic query subsystem uses these seed queries to perform augmented search operations.
    • Synthetic queries are computer-generated search queries created using a method described in a patent. These queries aim to improve search engine performance by understanding user intent better.
    • The patent suggests a process where coding fragments and seed queries are used to create query templates. These templates then generate candidate synthetic queries by applying rules to other documents on the same website.
    • Candidate synthetic queries are evaluated based on their performance in search operations, and those exceeding a performance threshold are designated as synthetic queries.
    • These synthetic queries can be used to enhance user searches, refine queries, or suggest related searches to users for better search results.
  • The query generation subsystem analyzes data from a structured document corpus to generate synthetic queries.
  • These synthetic queries are generated based on the structural characteristics of query terms as they appear in the documents.


Content from the blog

LLMO: How do you optimize for the answers of generative AI systems?

As more and more people prefer to ask ChatGPT rather than Google when searching for read more

What is the Google Knowledge Vault? How it works?

The Google Knowledge Vault was a project by Google that aimed to create an extensive read more

What is BM25?

BM25 is a popular ranking function used in information retrieval systems to estimate the relevance read more

The dimensions of the Google ranking

The ranking factors at Google have become more and more multidimensional and diverse over the read more

Interesting Google patents for search and SEO in 2024

In this article I would like to contribute to archiving well-founded knowledge from Google patents read more

What is the Google Shopping Graph and how does it work?

The Google Shopping Graph is an advanced, dynamic data structure developed by Google to enhance read more