What we can learn from DOJ trial and API Leak for SEO?
With the help of Google Leak Analyzer, I have compiled all insights from the DOJ antitrust trial documents and the API leak from 2024 in this article and summarized the findings for SEO.
The Google API Leak and the documents from the DOJ Google Antitrust trial have provided an unprecedented look into the inner workings of Google’s search algorithms. These revelations offer crucial insights into how Google evaluates search quality, and assesses factors like authority, expertise, and trust (often summarized as E-E-A-T).
By examining internal data structures, ranking systems, and testimony from Google executives, we can gain a clearer understanding of the signals and metrics that influence search visibility.
Contents
- 1 Insights about the role of user signals in search quality evaluation and ranking
- 2 Insights about the Use of Click Data for Rankings
- 3 Insights about Key Systems and Metrics
- 4 Google is not using Cosine similarity for determining semantic similarity between queries and documents
- 5 Signals and Metrics Related to Authority, Expertise, Quality, and Trust (E-E-A-T)
- 6 Insights about General Methodologies and Metrics
- 7 Insights about Algorithm development at Google
- 8 Insights about Search Index Data Disclosure and its Impact
- 9 Inisghts about Generative AI (GenAI) in Search
- 10 Third parties gets a lower quality for grounding via Vertex Search API
- 11 Insights about Privacy Considerations for User Data
- 12 Conclusions for SEO
Insights about the role of user signals in search quality evaluation and ranking
Google’s continuous improvement in search ranking is heavily reliant on learning from user interactions. As stated in the DOJ trial documents, “Learning from this user feedback is perhaps the central way that web ranking has improved for 15 years.” Every user interaction, such as a click, provides training data, indicating that “for this query, a human believed that result would be most relevant.”
Based on the testimony of P. Pandurang Nayak (a Google Search executive) in the US v. Google antitrust trial, I’ve extracted these valuable SEO insights about Google’s search systems:
Google Search Architecture
- Google’s search system works by first retrieving many matching documents from its index, then using core algorithms to narrow down to several hundred results, before applying machine learning to finalize rankings
- The core ranking system includes navboost (captures user clicks) and other traditional signals like page quality, topicality, and localization
- Major deep learning systems implemented since 2015 include RankBrain, DeepRank, and RankEmbed BERT
Ranking Signals
- Google uses approximately 100+ signals for ranking, combining traditional and machine learning approaches
- Core traditional signals include topicality, page quality/reliability, and localization
- User data (clicks and engagement) play a significant role in ranking via navboost and machine learning models
Mobile vs. Desktop
- Google differentiates between mobile and desktop searches, recognizing different user intents
- Location data is crucial for both mobile and desktop search relevance
- For the same query term, Google may show different results on mobile vs. desktop based on common usage patterns
- Example: “Bank of America” on mobile might prioritize maps/locations, while desktop might prioritize the homepage

Insights about the Use of Click Data for Rankings
Contrary to previous public statements by Google spokespeople, Nayak’s testimony confirmed that Google does use clicks for rankings through systems like NavBoost. This system analyzes user interaction data, including clicks, to refine search results.
User Data and Machine Learning
- Google’s machine learning models are trained on vast amounts of user data (clicks and queries)
- Models like RankBrain, DeepRank and RankEmbed BERT all use user feedback data
- Google periodically retrains these models to keep them updated with fresh data and new events
- Machine learning helps with both language understanding and incorporating “world knowledge”


Insights about Key Systems and Metrics
- Navboost: This system is explicitly used for ranking web results (the “10 blue links”) and heavily relies on click data. It helps distinguish between different search intents for the same query based on device or location, as highlighted by Pandu Nayak’s testimony (e.g., “football” in the US vs. UK, “Bank of America” on desktop vs. mobile). Navboost utilizes various click-driven metrics to boost, demote, or adjust rankings.
- Craps Module: This module is related to click and impression signals and includes metrics such as:
- badClicks: Measures unsuccessful search results.
- goodClicks: Measures successful search results.
- lastLongestClicks: Indicates the duration a user spent on a page after clicking from search results before returning to the SERP.
- unsquashed clicks and unsquashed last longest clicks: These, along with “squashed” clicks, are used to fight manual and automated click spam, suggesting Google analyzes cookie history and logged-in Chrome data.
- Glue: While often confused with Navboost, Glue deals with other search features on the page. It analyzes diverse user interactions like clicks, hovers, scrolls, and swipes to determine when and where these features should appear.
- RealTime Boost: This system uses data from the Chrome browser to influence search rankings. Metrics like total Chrome views for a site and Chrome transition clicks (chrome_trans_clicks) are considered, emphasizing the importance of optimizing for Chrome user behavior. This contradicts previous public statements from Google representatives denying the use of Chrome browsing data for ranking purposes. More details about Realtimeboost.
A sustained negative dynamic of clicks could lead to a reassessment of a page’s quality or relevance, influencing its future visibility. The following systems I will discuss in detail because of their importance.

Information Satisfaction (IS) Score
This IS Score is a fundamental metric for Google’s continuous improvement of its search algorithm and ensuring user satisfaction.
- Definition : Information Satisfaction (IS) is a critical metric Google uses to evaluate content quality and user satisfaction. It serves as a primary indicator of search quality.
- Key Variants : The IS Score, particularly the IS4 and its derivative IS4@5 , plays a crucial role in measuring and ensuring the quality of search results.
- Determination : IS scores are derived from human evaluator ratings . These evaluators assess the relevance, comprehensiveness, and user satisfaction provided by search results. They follow structured guidelines emphasizing originality, comprehensiveness, relevance, user experience, and E-E-A-T (Expertise, Authoritativeness, Trustworthiness).
- Role in Algorithm Refinement : IS scores are crucial training data for machine learning models , guiding these systems to better predict and identify high-quality content. By benchmarking IS scores against those of competitors or previous algorithm versions, Google can identify performance gaps and target specific areas for improvement.
- Correlation with User Engagement : High IS scores often correlate with longer dwell times and higher click-through rates (CTR), indicating that users find content with high IS scores more useful and relevant.
- Interaction with Ranking Systems : Data and insights from IS Scores feed into various ranking systems like RankBrain, SpamBrain, Helpful Content System, and MUM (Multitask Unified Model).
- Page Quality Ratings and IS Value : Human evaluators assign Page Quality Ratings, which directly influence the IS value:
- Lowest Quality Pages : Receive IS scores typically near 0 to 10 .
- Low Quality Pages : Receive IS scores typically in the range of 10 to 30 .
- Medium Quality Pages : Receive IS scores typically in the range of 30 to 50 .
- High Quality Pages : Receive IS scores typically in the range of 50 to 70 .
- Highest Quality Pages : Receive IS scores typically in the range of 70 to 100 .
- Benchmarking : IS scores help Google compare its performance against other search engines or previous versions of its own algorithms, guiding further refinements.

NavBoost
Navboost is a crucial system within Google’s search architecture designed to enhance the search experience by utilizing data on user interactions with search results, specifically clicks. It plays a significant role in determining the order of search results.
Here are more details about Navboost:
- Primary Function: Navboost compiles user click data and incorporates feedback from human evaluators to refine and improve the ranking of search results. It focuses on learning from user behavior to determine which search results are most relevant and should be ranked higher, directly influencing the effectiveness of search result rankings (Trial Exhibit-UPXD105: U.S.v. Plaintiff States v. Google LLC).
- User Interaction Data: Navboost collects user interaction data, specifically how users click and engage with search results. This data helps infer the relevance and usefulness of the results based on real user behavior. If a snippet’s actual Click-Through Rate (CTR) significantly falls short of the expected rate, Navboost registers this discrepancy and adjusts the ranking of the DocIDs accordingly. Conversely, if the CTR is significantly higher, the ranking often rises (Figure 6, Figure 7: Slide from a Google presentation, Trial Exhibit – UPX0228, U.S. and Plaintiff States v. Google LLC).
- Click Signals and Attributes: The API documentation confirms that Navboost has a specific module entirely focused on click signals. This module defines “click and impression signals for Craps,” one of the ranking systems. Key attributes considered include:
badClicks(type: float())clicks(type: float())goodClicks(type: float())impressions(type: float())lastLongestClicks(type: float())unicornClicks(type: float()) – subset of clicks associated with an event from a Unicorn user.unsquashedClicks(type: float())unsquashedImpressions(type: float())unsquashedLastLongestClicks(type: float())
(Source:Google-Leak_API-Module: QualityNavboostCrapsCrapsClickSignals module)
- Squashing: Navboost utilizes “squashing,” a function that prevents one large signal from dominating others, to normalize click data and prevent manipulation based on click signals (Google’s “Scoring local search results based on location prominence” patent US8046371B2/en).
- Data Window: Navboost has used a rolling 18-month window of click data since around 2005. It was updated to use a 13-month data window for all queries received (DOJ antitrust case, Pandu Nayak testimony).
- Integration with Ranking Systems: The data and insights from Navboost feed into various ranking systems like RankBrain, SpamBrain, Helpful Content System, and MUM (Multitask Unified Model). These systems use the data to adjust how they process and prioritize information (Source: Google-Leak_API).
- Core Updates and Retraining: Navboost interacts with periodic core updates. During an update, algorithms may be adjusted, and new or modified algorithms are retrained on the latest datasets, which include newly gathered data from Navboost. This retraining allows algorithms to “learn” from the most recent user behaviors (Source: Google-Leak_API).
- Distinction from Glue: According to Pandu Nayak’s testimony in the DOJ antitrust case, Navboost primarily handles web search results, while “Glue” is another name for Navboost that includes all other features on the page and handles ranking for other universal search verticals (DOJ antitrust case, Pandu Nayak testimony).
- Slices: Navboost uses “slices” to manage different data sets for mobile, desktop, and local searches (Source: Google-Leak_API).
- Filtering Clicks: Google appears to have ways to filter out clicks they don’t want to count in their ranking systems and include ones they do. They also measure the length of clicks (e.g., pogo-sticking) and impressions (Source: Google-Leak_API).
Glue
Glue is a significant component within Google’s search architecture, closely related to Navboost, and plays a key role in collecting and organizing user interaction data.
Here are more details about Glue:
- Definition: Glue is essentially a “super query log” that collects a wide array of data about a user’s query and their interaction with the search results. It can be thought of as a giant table of data.
- Relationship with Navboost: An important component of the Glue data is Navboost data. As stated by Allan, “Glue contains . . . Nav[b]oost information.” Pandu Nayak also noted that “Glue is just another name for [N]avboost that includes all of the other features on the page”. While Navboost primarily focuses on web search results, Glue handles ranking for other universal search verticals.
- Data Collected: The data underlying Glue consists of information relating to:
- The query: Such as its text, language, user location, and user device type.
- Ranking information: Including the 10 blue links and any other triggered search features that appear on the Search Engine Results Page (SERP), such as images, maps, Knowledge Panel, People also ask, etc.
- SERP interaction information: Such as clicks, hovers, and duration on the SERP.
- Query interpretation and suggestions: Including spelling correction and salient query terms.
- Function within Tangram: Glue functions within the framework of Google’s Tangram system, which assembles the SERPs. Glue is tasked with the organization and presentation of data, ensuring that the search results are not only relevant but also well-structured and user-friendly (Source: Trial Exhibit-UPXD105: U.S.v. Plaintiff States v. Google LLC).
- Role in Training Models: User-side Data, which includes Glue data, is used to train Google’s ranking and retrieval components, as well as GenAI models used for Google’s GenAI Products.
- DOJ Antitrust Trial Context: Under the proposed remedy in the DOJ antitrust trial, Google must make available to Qualified Competitors, “at marginal cost” and on a “periodic basis,” the User-side Data used to build, create, or operate the GLUE statistical model(s). Importantly, the remedy does not force Google to disclose any models or signals built from Glue data, only the underlying data itself.

RankEmbed & RankEmbedBERT
RankEmbed and its later iteration, RankEmbed BERT, are advanced AI-based deep-learning ranking models used by Google to understand language and improve search results.
Here are more details about RankEmbed:
- Core Function: RankEmbed and RankEmbed BERT are ranking models designed to identify the most relevant documents for a given query, even if the query lacks specific terms. They achieve this through a strong understanding of natural language and semantic matching.
- Evolution: RankEmbed was launched first. RankEmbed BERT is an augmented version that incorporates the BERT algorithm and structure, making it even better at understanding language (Source: DOJ vs Google trial, Pandu Nayak testimony, Page 56).
- Impact : These models have directly contributed to Google’s quality edge over competitors. Source: https://storage.courtlistener.com/recap/gov.uscourts.dcd.223205/gov.uscourts.dcd.223205.1436.0_4.pdf
- Training Data Sources: These models rely on two primary sources of data:
- Search Logs: They are trained on a percentage of 70 days of search logs, which include click and query data (Source: DOJ vs Google trial, Pandu Nayak testimony, Page 56).
- Human Rater Scores: They also use scores generated by human quality raters. These raters follow detailed guidelines to assess the quality of organic search results, providing examples of helpful and unhelpful results that teach the AI models. The models are fine-tuned on this human rater data (Source: DOJ vs Google trial, Pandu Nayak testimony, Page 56).
- Efficiency and Quality: RankEmbed is notably efficient, trained on 1/100th of the data used for earlier ranking models, yet it provides higher quality search results.
- Natural Language Understanding: As a deep learning system, RankEmbed BERT has a strong understanding of natural language, which allows it to semantically match documents and queries more effectively.
- Handling Long-Tail Queries: RankEmbed particularly helped Google improve its answers to long-tail queries.
- Input Information: Among the underlying training data is information about the query, including:
- Salient terms: Terms that Google has derived as important from the query.
- Resultant web pages: Information about the web pages that were recommended to the user.
- Output Signals: RankEmbed BERT generates signals that are considered top-level ranking signals, alongside quality and popularity, and these can be aggregated to create even more signals for the final ranking score (Google’s ranking signals – from the DOJ vs Google trial).
- API Leak Attributes (Video Content Search): The API leak data includes a module
VideoContentSearchRankEmbedNearestNeighborsFeatureswhich contains attributes related to RankEmbed similarity for video content search:anchorReSimilarity: Rankembed similarity between the rankembed neighbor and the video anchor.navQueryReSimilarity: Rankembed similarity between the rankembed neighbor and the top navboost query of the video.reSimilarity: Rankembed similarity between the rankembed neighbor and the original query candidate. (Source: GoogleApi.ContentWarehouse.V1.Model.VideoContentSearchRankEmbedNearestNeighborsFeatures)
In summary, RankEmbed BERT is a sophisticated AI model that leverages vast amounts of user interaction data and human quality assessments to deeply understand queries and content, leading to more relevant and higher-quality search results.
Google is not using Cosine similarity for determining semantic similarity between queries and documents
Various measurement methods can be used to determine the distance of vectors in semantic spaces in order to determine semantic similarity. Here are the most popular ones:
- Cosine Similarity: Most common method for semantic similarity. Measures the cosine of the angle between two vectors.
- Euclidean Distance: Measures straight-line (geometric) distance between two vectors.
- Dot Product (Inner Product): Often used in neural nets and attention mechanisms. Measures alignment + magnitude.
Many SEOs use the cosine similarity between search queries and documents as an analysis approach. According to the information from the Antitrust Trial, Google uses dot.product for this use case not cosine similarity.
Therefore, focusing on cosine similarity for short head queries does not seem to be the right approach.
Signals and Metrics Related to Authority, Expertise, Quality, and Trust (E-E-A-T)
The concept of E-E-A-T, as outlined in the Search Quality Raters Guidelines, is deeply integrated into Google’s evaluation processes. The DeepRank neural network, used at later stages of the ranking pipeline, learns to make judgments similar to human evaluators, implying it must process similar information.
Expertise and Author:
- Authors Module: Google stores information about document authors in a string format. While the exact usage is not fully detailed, its presence suggests potential importance, as indicated by Paul Haahr’s resume mentioning coordination with an “Authors” team.
- bylineDate: Google looks at dates in the byline to assess freshness, which can be a component of expertise for timely content.
- smallPersonalSite: This feature identifies small personal sites or blogs. While its direct impact on ranking is not specified, it suggests Google categorizes sites by type, potentially influencing how expertise is evaluated or if a Twiddler (re-ranking function) might boost or demote such sites.
Authority and Trust:
- PageRank Variants: The documents reveal multiple types of PageRank, including deprecated versions, confirming its continued role as a signal, albeit one of many.
- hostAge (PerDocData Module): This attribute is used to “sandbox fresh spam in serving time,” confirming that the age of a host can impact the visibility of new content, acting as a trust signal.
- RegistrationInfo: Google stores domain registration information, which could be used to assess the legitimacy and history of a domain.
- SpamBrain: This system is explicitly mentioned as a feature for spam detection and demotion, directly impacting trust and quality signals.
- Quality Rater Feedback: Human quality ratings, including E-E-A-T evaluations, are used to test algorithm changes before deployment. DeepRank learns from these evaluations.
- Whitelists: During critical events like the Covid-19 pandemic or democratic elections, Google employed whitelists for websites that could appear high in results for sensitive searches. This indicates a direct intervention to promote authoritative and trustworthy sources.
- siteRadius and siteFocusScore: These metrics are used to determine whether a document is a core topic of the website by comparing page embeddings to site embeddings, indicating an assessment of topical authority and relevance.
Quality:
- QualityBoost (Twiddler): This is a re-ranking function that operates after the primary search algorithm to enhance quality signals.
- titlematchScore: This feature is believed to measure how well a page title matches a query, indicating relevance and quality.
- avgTermWeight: Google measures the average weighted font size of terms in documents and anchor text, which could be a subtle signal related to content emphasis and quality.
- BabyPanda score: This is mentioned as a minor factor considered during the quality evaluation process, alongside other spam signals.
- Google employs 16,000 human raters globally to evaluate search quality
- Uses “IS score” (Information Satisfaction) as a primary quality metric
- Runs hundreds of thousands of quality tests annually, including side-by-side comparisons
- Machine learning models are often initially trained on user data, then fine-tuned based on human rater evaluations
The trial documents also touched upon “quality measures including authoritativeness” associated with each DocID.
- PageRank : A key quality signal mentioned is PageRank, which captures a web page’s quality and authoritativeness based on the frequency and importance of the links connecting to it. It is described as a single signal relating to distance from a known good source and is used as an input to the Quality score.
- User Data in Quality Signals : While most of Google’s quality signal is derived from the webpage itself, it was acknowledged that some minor sub-components of quality signals do rely on user data. Source: https://storage.courtlistener.com/recap/gov.uscourts.dcd.223205/gov.uscourts.dcd.223205.1436.0_4.pdf
E-E-A-T as a quality classifying system plays a crucial role also for Gemini and AIOverviews
E-E-A-T seems to play a significant role in pretraining Gemini. So authority signals are used to filzter out trustful sources for the pretraining of Gemini base model.


I firmly believe that quality classification systems such as Google E-E-A-T also play an important role in the selection of sources for AIOverviews.

Search engines differentiate between relevance and quality in the sequential evaluation. The quality rating is used as a kind of filter mechanism to separate and prioritize trustworthy sources from e.g. spam before applying relevance scoring.
During grounding, the prompt is broken down into different search queries, which makes relevance scoring per search query very complex. Fewer resources are needed to filter out sources by quality class, which makes it more suitable for grounding.
This is could be also a reason why you always see many sources that do not rank in the top 10 search results.
The Anitrust Trial documents don’t talk about E-E-A-T, but they do talk a lot about quality systems.

Popularity Ranking and Chrome Visit Data
The trial discussed “popularity as measured by user intent and feedback systems including Navboost/Glue.” Exhibits suggested that popularity is based on “Chrome visit data” and “the number of anchors” (a measure quantifying links between pages). Source: https://storage.courtlistener.com/recap/gov.uscourts.dcd.223205/gov.uscourts.dcd.223205.1436.0_4.pdf
PageRank as a Core Quality Signal
PageRank is explicitly identified as a key quality signal. It captures a web page’s quality and authoritativeness based on the frequency and importance of the links connecting to it. It is described as a “single signal relating to distance from a known good source, and it is used as an input to the Quality score.” PageRank was a foundational innovation for Google. Source: https://storage.courtlistener.com/recap/gov.uscourts.dcd.223205/gov.uscourts.dcd.223205.1436.0_4.pdf
Nature of Google’s Quality Signals
While some of Google’s quality sub-signals are scale-dependent, the document notes that “most of Google’s quality signal is derived from the webpage itself.” This indicates a strong reliance on on-page factors for quality assessment. Source: https://storage.courtlistener.com/recap/gov.uscourts.dcd.223205/gov.uscourts.dcd.223205.1436.0_4.pdf
Insights about General Methodologies and Metrics
The documents and leaks also shed light on the broader architecture and methodologies Google employs:
- Mustang: Identified as the primary scoring, ranking, and serving system, encompassing various scoring algorithms.
- SuperRoot: The central system that coordinates queries and manages the post-processing system for re-ranking and presenting search results.
- Twiddlers: These are re-ranking functions that adjust information retrieval scores or change document rankings just before presentation (e.g., FreshnessTwiddler for document freshness, QualityBoost for quality). More details about Twiddler.
- Alexandria and TeraGoogle: Core indexing systems, with Alexandria handling primary indexing and TeraGoogle managing long-term document storage.
- Trawler: The web crawling system responsible for maintaining crawl rates and understanding page change frequency.
- HtmlrenderWebkitHeadless: A rendering system for JavaScript pages, which transitioned from Webkit to Headless Chrome, underscoring the importance of rendering JavaScript for indexing.
- Tangram (formerly Tetris): This system assembles SERPs, determining which search features (Top stories, Images, Videos, People Also Ask, etc.) to display and their placement.
- DeepRank: Google uses deep neural networks like DeepRank at later stages of the ranking pipeline, which learn complex relationships between ranking factors, moving beyond simple linear dependencies.
- Vertical Optimization: Google identifies different site business models (e.g., news, e-commerce, personal blogs) for vertical optimization, suggesting tailored ranking approaches.
Insights about Algorithm development at Google
Google uses various metrics and methods in iterative steps to further develop its algorithm. User signals play a significant role.



Insights about Search Index Data Disclosure and its Impact
The court mandated Google to share specific search index data with Qualified Competitors as a remedy for its anti-competitive behavior. This data includes:
- The unique DocID for each document, with a notation for duplicates.
- A DocID to URL map .
- The first time a URL was seen .
- When the URL was last crawled .
- Spam score .
- Device-type flag .
The purpose of this disclosure is to enable rivals to more quickly build a competitive search index that is robust in volume, freshness, and utility. For instance, the DocID, DocID to URL map, and duplication information help competitors identify and crawl more valuable web pages efficiently. Source: https://storage.courtlistener.com/recap/gov.uscourts.dcd.223205/gov.uscourts.dcd.223205.1436.0_4.pdf
Inisghts about Generative AI (GenAI) in Search
A document discusses the integration of GenAI features, including AI Overviews, into Search.
- AI Overviews Trigger : They are not triggered for every query but appear “whenever we think that it is both high quality information and a net add to the page,” depending on relevance and other signals.
- Impact on User Behavior : AI Overviews have led to increased consumer satisfaction and query volume, with users asking “longer” and “more complex” questions.
- Effect on Organic Results : The placement of AI Overviews on the SERP has reduced user interactions with traditional “10 blue links.” However, pages appearing as “corroborating links” within AI Overviews receive more clicks than if they appeared as traditional blue links for the same query.
- Observations show that more and more AIOverviews are not in position 1. But why is that? My guess is that Google is collecting more and more user signals via CTR to position the new AIO SERP feature via Tangram/Glue. If users click on classic search results more than they interact with the AIOverviews, the latter will slip above the AIOs.


- LLM Training : Large Language Models (LLMs) are mainly pre-trained on vast amounts of text from the web, then fine-tuned on specialized data for tasks like answering questions. Notably, Google does not use click-and-query data to train GenAI models used in Search or GenAI products. Source:https://storage.courtlistener.com/recap/gov.uscourts.dcd.223205/gov.uscourts.dcd.223205.1436.0_4.pdf
- MAGIT is the system used to generate AIOverviews. Magit is based on the basic Gemini model and has been fingetuned to search data. MAGIT was fine-tuned for the output of AUOverviews based on search query data.

- The Fastsearch architecture is probably responsible for filtering out the relevant documents from the Docjoin index. In addition to documents from the Google Common Corpus (GCC), the Docjoin Index also contains metadata and ranking-relevant signals such as user signals.

Third parties gets a lower quality for grounding via Vertex Search API
According to the documents third parties like Anthropic gets lower quality search results when they use the Vertex Search API for their grounding process via the Google search results.

Insights about Privacy Considerations for User Data
Google raised valid concerns about user privacy regarding the disclosure of User-side Data. Experts agreed that revealing sensitive user information is a risk without adequate anonymization and privacy-enhancing techniques (e.g., adding noise, generalization, k-anonymity). Source: https://storage.courtlistener.com/recap/gov.uscourts.dcd.223205/gov.uscourts.dcd.223205.1436.0_4.pdf
These insights from the DOJ trial documents provide a deeper understanding of Google’s internal mechanisms, the importance of various signals, and the strategic decisions that have shaped its search dominance.
Conclusions for SEO
Based on the insights from the Google API Leak and the DOJ Antitrust trial documents, here are key conclusions for SEOs:
- Prioritize Genuine User Engagement (Clicks & Satisfaction):
- Optimize for CTR & Dwell Time: Since NavBoost and Glue use click data (good clicks, bad clicks, last longest clicks) for ranking and feature display, focus on creating compelling titles, meta descriptions, and rich snippets to encourage clicks. Once on your site, ensure content is highly satisfying to reduce “pogo-sticking” (returning to SERP) and increase dwell time.
- Monitor User Behavior: Use analytics to understand how users interact with your content. If users quickly bounce back to the SERP, it’s a strong signal that your content isn’t meeting their needs, which could negatively impact rankings.
- Build Comprehensive Site-Wide Authority and Trust (E-E-A-T):
- Holistic Quality: Google’s algorithmic approximation of E-E-A-T, using modules like
PerDocData,QualityNsrNsrData, andCompressedQualitySignals, means you need to focus on overall site quality, not just individual pages. - Demonstrate Expertise & Authorship: Highlight author credentials (
authorObfuscatedGaiaStr), especially for YMYL topics. Ensure content is factually accurate, well-researched, and clearly attributed to experts. - Site Authority is Real: The
siteAuthoritymetric confirms that building a strong, reputable domain is crucial. This involves consistent high-quality content, positive brand mentions, and a strong backlink profile from authoritative sources. - Address YMYL with Caution: For sensitive topics (health, finance, news), Google uses whitelists and specific quality scores (
ymylHealthScore,ymylNewsScore). If you operate in these areas, strive to be an established, highly authoritative source.
- Holistic Quality: Google’s algorithmic approximation of E-E-A-T, using modules like
- Links (PageRank) Remain Fundamental:
- Quality Backlinks: PageRank is explicitly stated as a core quality and authoritativeness signal. Continue to pursue high-quality, relevant backlinks from authoritative websites.
- Internal Linking: A strong internal linking structure helps distribute PageRank and topical authority throughout your site, improving the visibility of important pages.
- Focus on Semantic Understanding and User Intent (BERT/RankEmbed):
- Topic-Centric Content: With models like RankEmbed and RankEmbedBERT, Google excels at understanding the semantic meaning of queries and documents. Create comprehensive content that covers topics in depth, addressing various facets of a user’s intent rather than just targeting exact keywords.
- Long-Tail Opportunities: These models are particularly effective for long-tail queries. Ensure your content naturally answers complex and specific questions users might ask.
- Be Mindful of New Content and Sandboxing:
- Patience for New Sites/Content: The
HostAgeattribute and the concept of sandboxing fresh spam suggest that new websites or significantly new content might face an initial period of limited visibility. Focus on building trust and quality signals over time.
- Patience for New Sites/Content: The
- Adapt to Generative AI (AI Overviews):
- Concise & Authoritative Answers: AI Overviews are becoming more prevalent. Aim to provide clear, concise, and authoritative answers within your content that an LLM could easily extract and cite. Being a “corroborating link” within an AI Overview can drive significant traffic.
- Content Structure: Structure your content with clear headings, summaries, and direct answers to common questions to make it easily digestible for both users and AI models.
- On-Page Quality is Paramount:
- Content is King (Still): The document states that “most of Google’s quality signal is derived from the webpage itself.” This reinforces the importance of high-quality, original, relevant, and well-structured content. Avoid gibberish (
GibberishScore), spam (spamtokensContentScore,SpamWordScore), and low-quality content (lowQuality). - User Experience: Ensure your site is mobile-friendly, fast, and free of intrusive elements (
desktopInterstitials).
- Content is King (Still): The document states that “most of Google’s quality signal is derived from the webpage itself.” This reinforces the importance of high-quality, original, relevant, and well-structured content. Avoid gibberish (
- Optimize for Quality and Relevance: Google heavily weighs relevance and quality signals. Ensure your content is comprehensive, accurate, and provides genuine value. The “titlematchScore” and “avgTermWeight” suggest that on-page content and its presentation remain important.
In summary, SEOs should move beyond a purely keyword-centric approach to a more holistic strategy that prioritizes genuine user satisfaction, builds demonstrable authority and trust across the entire website, leverages the power of quality backlinks, and creates semantically rich content designed to answer user intent comprehensively, while also adapting to the evolving landscape of AI-powered search results.
- What we can learn about Googles AI Search from the official Vertex & Cloud documentaion - 19. September 2025
- What we can learn from DOJ trial and API Leak for SEO? - 6. September 2025
- Top Generative Engine Optimization (GEO) Experts for LLMO - 3. September 2025
- From Query Refinement to Query Fan-Out: Search in times of generative AI and AI Agents - 28. July 2025
- What is MIPS (Maximum inner product search) and its impact on SEO? - 20. July 2025
- From User-First to Agent-First: Rethinking Digital Strategy in the Age of AI Agents - 18. July 2025
- The Evolution of Search: From Phrase Indexing to Generative Passage Retrieval and how to optimize LLM Readability and Chunk Relevance - 7. July 2025
- How to optimize for ChatGPT Shopping? - 1. July 2025
- LLM Readability & Chunk Relevance – The most influential factors to become citation worthy in AIOverviews, ChatGPT and AIMode - 30. June 2025
- Overview: Brand Monitoring Tools for LLMO / Generative Engine Optimization - 16. June 2025
