Author: Olaf Kopp
Reading time: 17 Minutes

What we can learn about Googles AI Search from the official Vertex & Cloud documentation

5/5 - (2 votes)

As an SEO professional, understanding the intricate mechanisms behind Google’s search and generative AI systems is crucial for effective optimization. While the official Google Cloud documentation primarily details Google Cloud’s Vertex AI Search and AI Applications for developers, we can assume that the underlying principles and processes for Retrieval Augmented Generation (RAG), Grounding, Retrieval, and Ranking operate in a similar fashion within the broader Google Search ecosystem. This article will demystify these concepts and draw actionable conclusions for SEO and ranking optimization.

Unlocking Google Search: A Deep Dive into Retrieval, Ranking, RAG, and Grounding for SEOs

The landscape of search is constantly evolving, with artificial intelligence playing an increasingly central role. Google’s sophisticated search models, particularly those demonstrated in Vertex AI Search and AI Applications, offer a window into how the search giant identifies, evaluates, and presents information. For SEOs, comprehending these mechanisms—Retrieval, Ranking, Retrieval Augmented Generation (RAG), and Grounding—is no longer optional but essential for future-proofing strategies.

How Google Search Works: Retrieval and Ranking Explained

Google’s approach to serving search results is a two-phase process: first, retrieving a broad set of potentially relevant documents, and then ranking that subset for ultimate presentation. Ranking all available documents would be computationally expensive, hence the sequential nature.

  • Retrieval: Finding the Candidates The initial step involves the search model understanding the user’s query and rewriting it, then identifying a large subset of documents (potentially thousands) from its vast data stores that are relevant. This process relies on various signals to assign an initial relevance score:
    • Topicality: This includes traditional keyword matching, insights from knowledge graphs, and broader web signals.
    • Embeddings: Advanced models use embeddings to find conceptually similar content to the query, moving beyond exact keyword matches.
    • Cross-attention: This allows the model to analyze the intricate relationship between a query and a document to assign a relevance score, capturing deeper contextual connections.
    • Freshness: The age of documents is a significant factor, ensuring up-to-date information is prioritized when relevant.
    • User Events: Conversion signals, indicating how users interact with content, are incorporated for personalization.
  • Ranking: Ordering for Relevance Once documents are retrieved, a ranking model takes this subset and reorders them, assigning a new relevance score based on several conditions. From the thousands initially retrieved, the model typically serves the top 400 ranked results. Key ranking methods include:
    • Boost: This mechanism allows for the promotion or demotion of certain results based on custom attributes (e.g., star ratings, popularity) or freshness.
    • Search Tuning: This process specifically impacts how the model perceives the semantic relevance of documents and adjusts embedding relevance scores. It’s particularly useful for refining search for industry-specific or company-specific queries.
    • Event-based Reranking: Personalized results are delivered by updating rankings at the time of serving, leveraging user-events-based personalization models.
Source: https://cloud.google.com/generative-ai-app-builder/docs/ranking-overview?hl=en

Generative AI, RAG, and Grounding: The New Frontier

The integration of Large Language Models (LLMs) means search is no longer just about lists of links; it’s also about generating direct, accurate answers. This is where Retrieval Augmented Generation (RAG) and Grounding become critical.

  • Retrieval Augmented Generation (RAG) RAG is a methodology that empowers LLMs to produce responses firmly grounded in specific data sources, preventing them from “hallucinating” or generating factually inaccurate information. It involves two stages:

◦ Retrieval: Quickly identifying the most relevant facts from a data source.
◦ Generation: The LLM uses these retrieved facts to generate a coherent and grounded response.

  • The Power of Grounding Grounding ensures that every claim in an LLM-generated answer is supported by one or more reference texts (facts). This drastically reduces hallucinations and builds trust in AI-generated content. Grounding sources can include:
    • Google Search: For answers requiring world knowledge, a wide range of topics, or up-to-date information from the internet.
    • Inline Text: User-provided statements considered factual for a given request.
    • Vertex AI Search data stores: For grounding answers in enterprise-specific documents.
  • Dynamic Retrieval: Smart Grounding A new capability, dynamic retrieval, allows the model to intelligently decide when to use Google Search for grounding and when to rely on its intrinsic training data. This balances response quality with cost efficiency. For example, a query about the “latest F1 grand prix winner” would likely trigger Google Search grounding, while “tell me the capital of France” might use the model’s existing knowledge. The model assigns a prediction score to queries, determining if grounding with Google Search is necessary for up-to-date information.
  • Grounding for Factual Accuracy The Check Grounding API provides an overall support score (0-1) for how well an “answer candidate” (e.g., an LLM response) aligns with “facts” (reference texts). It also provides citations to the supporting facts for each claim made in the answer. A “perfectly grounded” claim is wholly entailed by the facts; partial correctness renders the entire claim ungrounded. There’s also an experimental “Helpfulness score” measuring how well an answer addresses the prompt’s core intent comprehensively, concisely, and clearly.

Deep Dive into Ranking Signals at Google

The various signals influencing retrieval and ranking offer clear insights into Google’s priorities:

  • Topicality & Keyword Relevance: Keyword matching (using algorithms like BM25) remains a signal, alongside understanding topics through knowledge graphs and other web signals. This emphasizes the continued importance of keyword research and comprehensive content.
  • Semantic Understanding (Embeddings & Cross-Attention): Beyond keywords, Google seeks to understand the conceptual similarity between a query and content, as well as the deep semantic relationship between them. This is powered by embeddings and cross-attention models, where documents numerically closest to the query’s embedding are ranked higher.
  • Freshness: The Value of Timeliness: The age of content is a direct ranking signal. Custom ranking can specifically boost results based on datetime attributes like publication_date or Google-inferred properties like datePublished and dateModified.
  • User Engagement and Personalization: User events and conversion signals are leveraged for personalization, influencing how results are ranked for individual users. Unique pseudonymized identifiers (userPseudoId) track user behavior without PII, enhancing model performance and personalization.
  • Structured Data and Custom Attributes: Structured data plays a crucial role. Custom numerical attributes (e.g., star_rating, distance_from_airport) can directly influence boost values and custom ranking formulas, allowing for granular control over relevance. Google also infers page data using properties that apply to content, such as datePublished and dateModified, which are indexed automatically.
  • Boosting & Demoting Factors: Explicit control mechanisms exist to promote or demote content. This highlights that certain attributes or conditions can override default relevance, pushing content higher or lower in results. However, care must be taken to calibrate boost amounts to avoid unintended consequences, like less relevant content outranking highly relevant pages.
  • Snippets & Extracted Content: Google actively extracts snippets, extractive answers, and extractive segments from documents to provide quick, concise information. These extractions range from brief previews to verbatim paragraphs, tables, or bulleted lists.

The two stages of RAG

Retrieval Augmented Generation (RAG) is a methodology that enables Large Language Models (LLMs) to produce responses that are grounded in specific data sources, thereby reducing the tendency of models to “hallucinate” or generate inaccurate information.
RAG operates in two distinct stages:

  1. Retrieval: This initial stage focuses on efficiently identifying and collecting the most relevant facts from a chosen data source. The goal is to quickly get the facts that are important for generating an answer. This is essentially a search problem, aiming to augment the model’s knowledge.
  2. Generation: In this subsequent stage, the Large Language Model (LLM) utilizes the facts that were retrieved in the first stage to generate a coherent and grounded response. The LLM generates an output that is supported by these relevant search results.

How does Boost impact ranking?

Boost is a powerful ranking method within Google’s search ecosystem that allows for the explicit promotion or demotion of search results. It is a crucial component of the second phase of the search workflow, where a ranking model reorders documents that have already been retrieved based on initial relevance scores.

Here’s a detailed breakdown of how Boost impacts ranking:

1. Direct Influence on Ranking Order:

  • When boost values are applied, they directly influence the order in which documents are presented in the search results.
  • A positive boost value (a floating point number in the range of 0 to 1) will promote results, causing them to appear higher in the rankings.
  • A negative boost value (a floating point number in the range of -1 to 0) will demote results, making them appear lower in the rankings.

2. Scope of Application:

  • Boost impacts documents that have been selected during the initial retrieval phase. Specifically, it impacts the first 1,000 retrieved documents and then ranks the top 400 to be served.

3. Types of Boost Specifications: Boost can be applied through different specifications, primarily based on conditions or attributes:

  • Boost with a fixed condition: You can specify a boolean filter expression (BOOST_CONDITION) to select documents, and if they satisfy this condition, a fixed boost value is applied. For example, a condition like star_rating >= 3.0 can promote all hotels with three stars or more by a set amount (e.g., 0.7).
  • Boost using custom numerical attributes: This allows for boosting results in a piecewise linear manner based on a document’s custom numerical attributes (e.g., star_rating, distance_from_airport). You define “control points” that map specific attribute values to corresponding boost amounts. The actual boost applied to a document is calculated using linear interpolation if its attribute value falls between two control points.
  • Boost according to freshness: This boosts results based on datetime attributes, such as publication_date or Google-inferred datePublished and dateModified. The boost amount can be adjusted based on the document’s age (duration since the datetime attribute). Similar to numerical attributes, control points define boost values for different durations, with linear interpolation determining the boost for intermediate durations. Google recommends that domain owners update these date properties for web pages and manually refresh data stores to ensure accuracy.
Source: https://cloud.google.com/generative-ai-app-builder/docs/boost-search-results?hl=en

4. Significant Impact and Calibration:

  • Boost conditions can significantly influence the final ranking of a result.
  • It is crucial to carefully calibrate the boost amount. Over-boosting can lead to unintended consequences, such as less relevant content being ranked higher than highly relevant pages.
  • Google recommends starting with a low and precise boost amount (e.g., 0.1 or less) and adjusting it based on the observed search output. An example illustrates how a large boost value combined with a relevance filter could cause documents with “lowest relevance” to be ranked as top results, pushing more relevant results lower.

5. Integration with Custom Ranking Expressions:

  • The overall effect of all custom boosts applied to a document can be represented as a boosting_factor signal, which can then be incorporated into a custom ranking formula alongside other signals like semantic relevance or keyword similarity.

6. Limitations:

  • Boost specifications are not applicable to healthcare search apps.

In summary, Boost provides a direct and powerful mechanism to fine-tune search result rankings by promoting or demoting documents based on various configurable conditions, including custom attributes and freshness. Its impact is significant and requires careful calibration to ensure optimal and relevant search experiences.

What is the Ranking API?

The Ranking API is a component within Google Cloud’s AI Applications designed to improve the quality of search and Retrieval Augmented Generation (RAG) experiences by reordering documents based on their relevance to a given query.

Here’s a breakdown of the Ranking API:

  •  Purpose and Functionality
    • The primary function of the Ranking API is to rank a set of documents (referred to as “records”) based on a user’s query.
    • It takes an initial list of documents and reranks them to provide more precise scores indicating how well each document answers the specific query.
    • This API is stateless, meaning documents do not need to be indexed by Vertex AI Search before calling it. This makes it suitable for reranking documents retrieved from other search solutions, such as Vector Search.
  • Key Differentiator
    • Unlike methods that rely solely on embeddings, which primarily assess the semantic similarity between a document and a query, the Ranking API provides precise scores for how effectively a document addresses the query.
  • Use Cases The Ranking API is valuable for various scenarios where content relevance to a user’s query needs to be precisely determined, including:
    • Improving existing search experiences.
    • Finding the right content to provide to a Large Language Model (LLM) for grounding in RAG workflows.
    • Identifying the most relevant sections within a longer document.
  • Input Data When making a request to the Ranking API, the following inputs are required:
    • Query: The user’s search query for which the documents are being ranked.
    • Records: A list of documents relevant to the query, provided as an array of JSON objects. Each record must include a unique id and either a title, content, or both.
    • The maximum number of tokens supported per record varies by model version (e.g., 512 tokens for models up to version 003, and 1024 tokens for version 004 and later). Content exceeding this limit is truncated.
    • Up to 200 records can be included per request.
    • Optional Parameters:
      • topN: Specifies the maximum number of ranked records to return. All records are still ranked internally, even if fewer are returned.
      • ignoreRecordDetailsInResponse: If set to true, only the record ID is returned to reduce the response payload size.
      • model: Specifies the model to be used for ranking. The default is semantic-ranker-default@latest, which automatically points to the latest available model.
  • Output Data The API returns a list of records, ranked by relevance, with the following information for each:
    • Score: A floating-point value between 0 and 1, indicating the relevance of the record to the query. A higher score means greater relevance.
    • ID: The unique identifier of the record.
    • Full Object: If ignoreRecordDetailsInResponse was not set to true, the title and content of the record are also returned.
  • Implementation Developers use the rankingConfigs.rank method to interact with the Ranking API. This typically involves providing the PROJECT_ID, QUERY, and the records array in a POST request. Example code snippets are provided in various languages (Python, cURL) to demonstrate its usage.

In essence, the Ranking API provides a sophisticated mechanism to refine search results by deeply understanding the relationship between a query and a document, offering a more nuanced relevance assessment than basic similarity measures.

How does custom ranking work?

Custom ranking is a method that allows users to control, tune, and override the default ranking logic in search results with a formula-based algorithm to meet specific requirements. This feature is available for structured, unstructured, and website data.
Here’s how custom ranking works:

1. Mathematical Expression: Custom ranking relies on a mathematical expression (or formula) that combines various signals to compute a new score for each search result. This expression is then used to rank the retrieved documents.

2. Signals Used in the Formula: The formula integrates two main types of signals:

  • Model-computed signals: These are generated by Google’s models and include:
    • default_rank: The default rank assigned by the standard Vertex AI Search (VAIS) ranking algorithm.
    • semantic_similarity_score: A score based on the similarity between the query and the document’s content embeddings, computed using a proprietary Google algorithm.
    • relevance_score: A score from a deep-relevance model that understands complex query-document interactions, determining the meaning and intention of a query in context.
    • keyword_similarity_score: A score emphasizing keyword matching, using the Best Match 25 (BM25) ranking function.
    • document_age: The age of the document in hours (supports floating-point values).
    • pctr_rank: A rank indicating predicted conversion rates, based on user event data (predicted Click-through rate, pCTR).
    • topicality_rank: A rank denoting keyword similarity adjustment using a proprietary Google algorithm.
    • boosting_factor: A combination of all custom boosts applied to the document.
    • Document-based signals: These are derived directly from your document data, such as a custom field. Any custom field marked as “retrievable” can be used as a signal by adding a c. prefix (e.g., c.date_approved for a custom field named date_approved).

3. Formula Syntax and Components: The custom ranking formula is a mathematical expression composed of:

  • Numbers (double): Positive or negative floating-point values used to weight signals or expressions.
  • Signals: The names of the standard or custom signals.
  • Arithmetic operators: + (addition) and * (multiplication).
  • Mathematical functions:
    • log(expression): Natural logarithm.
    • exp(expression): Natural exponent.
  • Reciprocal rank transformation function (rr): rr(expression, k) sorts documents by the expression value in descending order, assigns a rank, and then calculates 1 / (rank_i + k), where rank_i is the document’s position and k is a positive floating-point number. This function normalizes scores to the same scale.
  • Not a number (NaN) handling functions:
    • is_nan(expression): Returns 1 if the expression evaluates to NaN (e.g., a missing signal), otherwise 0.
    • fill_nan(arg_expression, fill_with_expression): Returns fill_with_expression if arg_expression is NaN, otherwise returns arg_expression. This helps handle documents missing certain signals.

4. How it Modifies Ranking (Example Scenario): Consider a query for a “luxury hotel with a large rooftop pool in Vancouver, pet-friendly and close to airport”. A purely embedding-based ranking might prioritize strong semantic matches like “luxury” and “rooftop pool” even if a hotel is not “pet-friendly.”. With custom ranking, a formula can be created (e.g., rr(semantic_similarity_score, 32) * 0.4 + rr(keyword_similarity_score, 32) * 0.3 + rr(c.distance_from_airport * -1, 32) * 0.8). This formula assigns weights to different signals like semantic similarity, keyword similarity, and distance_from_airport (a custom field). By transforming scores to the same scale using rr() and weighting them, a final custom ranking score is calculated for each hotel. This process can result in a more considered ranking that better matches user needs by allowing specific criteria (like “pet-friendly” or “close to airport”) to have a greater influence on the final rank.

Source: https://cloud.google.com/generative-ai-app-builder/docs/custom-ranking?hl=en

5. Implementation: To use custom ranking, a search request is made with two specific fields:

  •  rankingExpressionBackend: This specifies the ranking mechanism.
  • RANK_BY_EMBEDDING: The default, using a predefined embedding-based or relevance-based expression.
  • RANK_BY_FORMULA: Overrides the default and uses the custom formula provided in rankingExpression.
  • rankingExpression: Contains the mathematical formula that determines the ranking of retrieved documents.

6. Tuning the Formula: Finding the optimal weights for a custom formula can be complex. Google provides an open-source Python library for ranking tuning within Vertex AI Search, which helps programmatically discover a suitable formula based on a dataset of queries with corresponding “golden labels” (relevant documents or IDs).

In essence, custom ranking offers a highly flexible and transparent way to tailor search result ordering beyond generic relevance, allowing specific business logic and attribute priorities to directly influence how content is presented.

What is a support score?

In the context of Retrieval Augmented Generation (RAG) and the “Check Grounding API” within Google Cloud’s AI Applications, a support score is a metric that indicates how well an “answer candidate” (a piece of text, often an LLM-generated response) is grounded in a provided set of “facts” (reference texts).

Here are the key characteristics of a support score:

  • Numerical Range The support score is a floating-point value from 0 to 1.
  • Indication of Grounding: It loosely approximates the fraction of claims within the answer candidate that are found to be supported by one or more of the given facts. A higher score signifies greater agreement and grounding between the answer candidate and the facts.
  • Claim-Level Support: In addition to an overall answer-level support score, the Check Grounding API can also provide a claim-level support score for each individual claim (typically a sentence) in an answer candidate. This claim-level score also ranges from 0 to 1 and indicates how grounded that specific claim is in the provided facts.
  • Relationship with Citations: The support score is returned along with citations to the specific facts that support each claim in the answer candidate. A “perfectly grounded” claim is one that is wholly entailed by the facts; if a claim is only partially correct, it is considered ungrounded and will not receive citations.
  • Use in RAG: The Check Grounding API, which returns this score, is designed to be fast (latency less than 500ms), allowing chatbots to call it during each inference without significant slowdown. This helps users understand which parts of a generated response are reliable. Chatbots can also use a configurable citationThreshold (a float value from 0 to 1) to filter out responses that are likely to contain “hallucinated claims” (unsupported claims) based on the support score.

What is an extractive segment?

An extractive segment is a section of verbatim text extracted directly from a search result document. It is returned with each search response to enhance results.

Here are its key characteristics and uses:

  • Content and Length
    • Extractive segments are usually more complete and verbose than extractive answers.
    • They can consist of multiple paragraphs, including formatted text such as tables and bulleted lists.
    • For search tuning training data, an extractive segment needs to be sufficiently long, typically 250 to 500 words, containing enough context for training; a sentence or two is insufficient.
  • Purpose and Use Cases
    • Extractive segments can be displayed as an answer to a query.
    • They can be used to perform post-processing tasks.
    • They serve as input for Large Language Models (LLMs) to generate answers or new text.
    • In the context of search tuning, extractive segments are snippets taken verbatim from documents in a data store. These are used as training data, including segments that positively match training queries and at least 10,000 additional “random negative” segments not associated with queries, to tune the model.
  • Availability
    • Extractive segments are available for data stores with unstructured data and with advanced website indexing.
    • To enable extractive segments for unstructured apps, Enterprise edition features must be turned on. For website apps, both Enterprise edition features and advanced website indexing are required.
  • Options for Retrieval
    • Number of segments: You can specify up to 10 extractive segments to be returned for each search result.
    • Relevance scores: Extractive segments can be returned with relevance scores, which are based on the similarity of the query to the extracted segment. These scores range from -1.0 (less relevant) to 1.0 (more relevant). Turning on relevance scores can increase latency.
    • Adjacent segments: You can request up to 3 segments from immediately before and after the relevant segment (numPreviousSegments and numNextSegments) to add context and accuracy. This option can also increase latency.
  • Example For a query like “what is ai applications?”, an extractive segment might provide a detailed explanation covering how AI Applications allow developers to quickly ship new experiences, use API access to Google’s foundation models, combine organizational data, and integrate natural conversations with structured flows.
Source: https://cloud.google.com/generative-ai-app-builder/docs/snippets?hl=en

What is hybrid search?

Hybrid search is a search technique that combines both vector-based and keyword-based search methods to deliver more relevant and accurate responses to users.

This approach is an expansion of Vector Search, which is the engine that powers embeddings-based Retrieval Augmented Generation (RAG). Vector Search, in turn, uses embeddings—numerical representations that capture semantic relationships across diverse data—to find nearest neighbors.

The motivation for hybrid search stems from the limitations of traditional word matching, which works well for simple queries (e.g., “Samsung TV”) but struggles with more complex or nuanced requests (e.g., “a gift for my daughter who loves soccer and is a fan of Messi”). By integrating vector search technology, which finds items semantically relevant to a user’s intent, hybrid search aims to provide precise results that best match a user’s query, thereby enhancing the customer experience and potentially increasing conversion rates.
Hybrid search is currently available in public preview.

What is dynamic retrieval?

Dynamic retrieval is a novel capability within Google Cloud’s AI Applications, specifically offered as part of Grounding with Google Search. Its primary purpose is to help balance response quality with cost efficiency by intelligently determining when to use Google Search results for grounding and when to rely on the Large Language Model’s (LLM) intrinsic training data.

Here’s how dynamic retrieval works:

1. Prediction Score: When a request to generate a grounded answer is sent, AI Applications assigns a prediction score (a floating-point value from 0 to 1) to the user’s prompt. This score indicates how much the prompt could benefit from grounding the answer with the most up-to-date information from Google Search.

  • Higher prediction scores are assigned to prompts that require answers grounded in recent facts from the web (e.g., “Who won the latest F1 grand prix?” might get a 0.97 score).
  • Lower prediction scores are assigned to prompts where the model’s intrinsic knowledge is sufficient, and external grounding with Google Search is not strictly required (e.g., “Write a poem about peonies” might get a 0.13 score, or “Tell me the capital of France” can use the model’s knowledge).
Source: https://cloud.google.com/generative-ai-app-builder/docs/grounded-gen?hl=en

2. Dynamic Retrieval Threshold: Users can specify a dynamic retrieval threshold, which is also a floating-point value from 0 to 1, with a default value of 0.7 if not specified. This threshold dictates the behavior of grounding:

  • If the prediction score is greater than or equal to the threshold, the answer is grounded with Google Search.
  • If the prediction score is less than the threshold, the model may still generate an answer, but it will not be grounded with Google Search.
  • If the threshold value is set to zero, the response is always grounded in Google Search. Conversely, if the dynamicRetrievalConfig field is not set in the request, the answer is always grounded.

3. Benefits:

  • Cost Efficiency: Google Search grounding incurs additional processing costs. Dynamic retrieval helps manage these costs by only invoking Google Search when necessary.
  • Response Quality: It ensures that answers for time-sensitive or world-knowledge-dependent queries (e.g., “latest movies”) are up-to-date and accurate by leveraging Google Search.
  • Reduced Latency: For queries where the LLM’s intrinsic knowledge is sufficient, skipping external grounding can reduce latency.
  • By leveraging dynamic retrieval, Gemini can intelligently choose the most appropriate grounding strategy, leading to a balance between quality, latency, and cost for generative AI experiences.

What triggers dynamic retrieval in answer generation?

Dynamic retrieval in answer generation is triggered by an intelligent evaluation process that determines whether a user’s prompt would benefit from being grounded in up-to-date information from Google Search, or if the Large Language Model’s (LLM) intrinsic knowledge is sufficient.
Specifically, the triggers are based on the following:

1. Prediction Score for the Prompt: AI Applications assigns a prediction score (a floating-point value from 0 to 1) to the user’s prompt. This score reflects how much the prompt could benefit from grounding its answer with the most current information available through Google Search.

  • Higher prediction scores are associated with prompts that require answers based on recent facts from the web (e.g., “Who won the latest F1 grand prix?”).
  • Lower prediction scores are given to prompts where the LLM can adequately generate a response using its existing knowledge, without needing external grounding (e.g., “Write a poem about peonies” or “Tell me the capital of France”).

2. Dynamic Retrieval Threshold: A dynamic retrieval threshold can be specified in the grounded answer generation request. This is a floating-point value ranging from 0 to 1, with a default of 0.7 if not explicitly set.

Dynamic retrieval is triggered when the prompt’s prediction score is greater than or equal to the specified dynamic retrieval threshold. In this scenario, the answer will be grounded using Google Search. If the prediction score falls below the threshold, the model may still generate an answer, but it will not use Google Search for grounding.

If the dynamicRetrievalConfig field is entirely omitted from the request, the answer is always grounded with Google Search. Similarly, if the threshold value is explicitly set to zero, the response will also always be grounded in Google Search. This mechanism helps balance response quality with cost efficiency.

About Olaf Kopp

Olaf Kopp is Co-Founder, Chief Business Development Officer (CBDO) and Head of SEO & AI Search (GEO) at Aufgesang GmbH. He is an internationally recognized industry expert in semantic SEO, E-E-A-T, LLMO & Generative Engine Optimization (GEO), AI- and modern search engine technology, content marketing and customer journey management. Olaf Kopp is one of the first pioneers worldwide to have demonstrably worked on the topics of Generative Engine Optimization (GEO) and Large Language Model Optimization (LLMO). His first publications date back to 2023. As an author, Olaf Kopp writes for national and international magazines such as Search Engine Land, t3n, Website Boosting, Hubspot, Sistrix, Oncrawl, Searchmetrics, Upload … . In 2022 he was Top contributor for Search Engine Land. His blog is one of the most famous online marketing blogs in Germany. In addition, Olaf Kopp is a speaker for SEO and content marketing SMX, SERP Conf., CMCx, OMT, OMX, Campixx...

COMMENT ARTICLE



Content from the blog

What we can learn about Googles AI Search from the official Vertex & Cloud documentation

As an SEO professional, understanding the intricate mechanisms behind Google’s search and generative AI systems read more

What we can learn from DOJ trial and API Leak for SEO?

With the help of Google Leak Analyzer, I have compiled all insights from the DOJ read more

Top Generative Engine Optimization (GEO) Experts for AI Search / LLMO in 2026

Generative engine optimization, or GEO for short, also known as large language model optimization (LLMO), read more

From Query Refinement to Query Fan-Out: Search in times of generative AI and AI Agents

The introduction of generative AI, LLMs and AI Agents, represents a significant evolution in search read more

What is MIPS (Maximum inner product search) and its impact on SEO?

Maximum Inner Product Search (MIPS) and Inner Product Search (IPS) represent a fundamental shift in read more

From User-First to Agent-First: Rethinking Digital Strategy in the Age of AI Agents

The digital landscape is on the verge of a fundamental transformation. While ChatGPT and similar read more