Author: Olaf Kopp
Reading time: 7 Minutes

Becoming quotable for LLMs: Should I create comprehensive holistic content or short specialized content?

3/5 - (2 votes)

After my presentation at Campixx in Berlin, there were lively discussions about my opinion that, in most cases, holistic long-form content on a specific topic is no longer useful in terms of AI search.

In this article, I would like to explain in more detail why I see it this way.

As the landscape shifts towards Generative Engine Optimization (GEO) and Large Language Model Optimization (LLMO), content creators face a crucial question: should we focus on creating comprehensive, long-form content that covers a topic exhaustively, or prioritize shorter, specialized pieces designed for easy consumption by AI models?

This article based on my research of research paper amd patents related to LLM Readability and Chunk Relevance in the SEO research suite database.

How AI search systems work?

AI search systems have two options for generating a response. They can generate the response from the initially trained underlying foundation model, such as GPT or Gemini, or they can use a grounding process as part of retrieval-augmented generation to enrich the initially learned “knowledge” with information from a search index. In addition, newer models use reasoning processes that allow the systems to expand the context or perspective and draw conclusions.

This reduces the likelihood of hallucinations and incorrect information in the responses. It also enables more topic-specific responses that a foundation model cannot adequately provide on its own due to limited initial training data.

Fundamentally, it must be said that foundation models were not originally trained for knowledge storage, but for understanding natural language.

Grounding

Grounding means that the responses generated by a generative AI model are anchored in trustworthy external sources. Instead of relying solely on the knowledge from the original training dataset, the model is provided with additional context.

  • Without grounding: AI can generate plausible but potentially incorrect responses based on probabilities from the training model. Foundation models are trained with a focus on understanding (natural language understanding, or NLU) and outputting (natural language generation, or NLG) natural language, not on generating knowledge.
  • With grounding: AI bases its answers on concrete, verifiable sources, which increases the likelihood that the output is factually correct.

Retrieval Augmented Generation (RAG)

Grounding allows AI to link sources to sections or chunks or information nuggets and reference them. For more information, see the Google Research Paper “GINGER: Grounded Information Nugget-Based Generation of Responses.

This principle creates the basis for content to appear as a cited source in AI responses, which in turn creates visibility and authority.

Factors for document and chunk selection

As already described, the grounding process involves a sequential retrieval procedure:

  • Query fan out
  • Document selection
  • Chunk selection
  • Generation of the response by the LLM

Document selection is based on classic information retrieval factors for relevance assessment, such as keywords in the page title, BM25, backlinks, user signals, etc. Classic SEO therefore continues to be useful for optimization.

However, it is no longer so crucial whether documents rank in the top 3 or in position 6 or 9. Documents beyond the first page can also be taken into account.
The ranking results represent a kind of relevance set of documents for potential sources.

Which of these documents the LLM selects the appropriate passages or chunks from is then decided based on the possibility of processing the content, which I call LLM readability, and the relevance of the existing passages with regard to the respective topic aspects, which I call chunk relevance.

The weighting of these factors can vary depending on the AI system. With Perplexity, the weighting seems to be more on search engine rankings than, for example, with Google.

AI search systems are focussing more on passages or chunks then documents

The introduction of passage-based indexing by Google in 2020 makes even more sense today than it did in the past. If you look at the structure of AI responses, they are organized into individual passages that focus on specific aspects of a topic, including references to other chunks in the source documents.

This clearly shows that, ultimately, individual passages are more important than the document itself.

However, a document must be perceived as relevant enough for fan-out queries and its author as trustworthy enough to be considered for a relevant set of possible sources.

The usual SEO factors are taken into account here. If this relevant set is at the top of the n documents, relevant chunks are selected from this document that match the different themes of the fan-out queries. (See the patent “Thematic Search”).

The semantic similarity is probably determined via the chunk embeddings and the query embeddings (via cosine similarity or dot product).

What is the lost middle problem with LLMs?

The “lost middle” refers to a challenge LLMs face when processing long contexts.

Specifically, it means that when content is very long, LLMs can struggle to effectively utilize all the information provided. Important facts or details located within the middle of the lengthy text might be overlooked or not given adequate attention by the model.

The training data suggests presenting important facts in a way that avoids this “lost middle” problem, implying that structuring content to make key information easily extractable is crucial when dealing with longer documents.

Longform holistic content vs. Short specialised content

Content that thoroughly addresses user informational needs tends to perform better and my SEO experience of the past shows taht a holistic approach often performs well in document ranking. This aligns with the idea of creating comprehensive content that covers the “intent space” – thinking about all the variations in user intent, including comparisons, concerns, and preferences.

  • Addressing Diverse and Long-Tail Queries: Comprehensive content can be particularly effective for managing diverse and long-tail queries. By integrating both internal expertise (e.g., in-house data) and curated external information, content can answer highly specific questions that competitors may not fully address. This strategy helps capture niche search queries and perform well on complex topics.
  • Covering the “Next Natural Need”: Entity-rich content that tackles the “next natural need” of the searcher can increase engagement and dwell time. Comprehensive content is better positioned to guide the user through their entire search journey by addressing related questions and topics within a single piece.

Challenges with Overly Long or Unstructured Comprehensive Content

While comprehensive content has benefits, my research points to challenges if it’s not structured correctly:

  • The “Lost Middle” Problem: LLMs can struggle with very long contexts. Important facts or details located within the middle of lengthy, unstructured text might be overlooked by the model.
  • Processing Difficulty: Very long blocks of text can be harder for AI to process and extract information from effectively. The LLM Readability is lacking.

The Case for Short, Specialized, and Chunked Content

The insights of my research strongly emphasizes the benefits of breaking down content into smaller, digestible segments for LLMs:

  • Chunk Relevance: AI models often look for individual paragraphs or sections that answer a specific question well. Content should be “passage-ready,” meaning each chunk of text should ideally work on its own.
  • Optimizing for Snippet Features: Focusing on content that can easily be sectioned into snippets (like FAQs or How-To guides) is crucial. Concise, clear, and informative content in snippet format has a higher chance of being selected as the response by AI systems.
  • Facilitating Processing and Extraction: Well-structured content with clear headings, bullet points, and short paragraphs (recommended 2-4 sentences max) is easier for AI to process and extract. Chunking large blocks of text into smaller, semantically coherent segments makes it easier for LLMs to process, retrieve, and cite your content.
  • Answer Span Length: Optimizing content to create succinct answers that fit within specific length limits can improve visibility and ranking potential, as the system selects text spans based on numeric representations. Writing in 40-60-word blocks, each answering one micro-question, and leading with the answer mirrors how AI selects snippets.

Challenges with Exclusively Short, Specialized Content

While beneficial for AI processing, relying only on fragmented short content might not fully satisfy complex user needs or cover the entire “intent space” for a topic if the pieces are not connected or part of a larger, well-organized structure.

Finding the Balance: The Hybrid Approach

The most effective strategy, based on my research, appears to be a hybrid approach that combines the benefits of both:

  1. Be Comprehensive in Scope: Address the full “intent space” and cover the “next natural need” by including information relevant to diverse and long-tail queries. Combine internal expertise with curated external data where necessary.
  2. Be Specialized and Chunked in Structure: Organize this comprehensive information into short, self-contained, passage-ready sections. Use clear, descriptive, and semantically-rich headings (avoiding generic ones like “Overview”) and subheadings that match real questions. Employ formatting like bullet points, numbered lists, and short paragraphs (2-4 sentences max) to make content easily scannable and processable by LLMs.
  3. Optimize for Answer Spans: Within these chunks, ensure answers are concise yet comprehensive, designed to fit within the typical length limits of AI-selected answer spans. Lead with the answer in your paragraphs.
  4. Leverage Multimedia: Include diagrams, flowcharts, infographics, tables, and videos, as AI can extract insights from these, enhancing understanding and improving chances of inclusion.
  5. Maintain Quality and Relevance: Critically evaluate external data sources used in comprehensive content to ensure accuracy and relevance, just as Astute RAG aims to overcome imperfect retrieval.
  6. Encourage Engagement & Monitor Performance: While structure helps AI, user engagement metrics (dwell time, click-through rates) are still valuable signals. Allow for user feedback and monitor content performance to identify areas for improvement.

In conclusion, the goal for LLMO/GEO is not necessarily to choose strictly between long or short content, but to create content that is comprehensive in its coverage of user intent and topic scope, while being highly structured and chunked into short, easily digestible, passage-ready segments that LLMs can efficiently process, extract, and utilize for generating responses and snippets.

Why I would favor shorter specialised content

Based on my research, creating overly long content for LLMO/GEO is not recommended for a few reasons:

  1. Avoiding the “Lost Middle”: LLMs can struggle with long contexts, a phenomenon sometimes referred to as the “lost middle,” where important facts within lengthy text might be overlooked.
  2. Context Management: While context is important, too much context risks diluting the important facts you want the LLM to pick up on. A balanced, structured approach is optimal.
  3. Processing and Extraction: Concise paragraphs (e.g., 3-5 sentences) and shorter sentences are generally easier for LLMs to process and understand. This helps them extract key information and formulate responses or summaries more effectively.
  4. Optimizing for Answer Spans: LLMs often select specific text spans for answers. Optimizing content to create succinct answers that fit within specific length limits can improve visibility and ranking potential in generative AI responses.

However, one aspect should not be overlooked. With regard to source document selection, specialized content often enjoys greater relevance than generic content, as elements such as page titles or introductory passages can be created much more precisely to match the respective user intent. This increases the likelihood of being included in the relevant set of source documents. In addition, AI systems ensure that a document is not cited too often. Ultimately, I would therefore recommend opting for shorter, specialized content when in doubt.

About Olaf Kopp

Olaf Kopp is an online marketing expert for Generative Engine Optimization (GEO) and SEO. He has over 15 years of experience in Google Ads, SEO, and content marketing. Olaf Kopp is one of the early pioneers in the fields of Generative Engine Optimization (GEO) and digital brand building, and the inventor of modern GEO and marketing concepts such as LLM readability, brand context optimization, and digital authority management. Olaf Kopp is Co-Founder, Chief Business Development Officer (CBDO) and Head of SEO & AI Search (GEO) at Aufgesang GmbH. He is an internationally recognized industry expert in semantic SEO, E-E-A-T, LLMO & Generative Engine Optimization (GEO), AI- and modern search engine technology, content marketing and customer journey management. Olaf Kopp is one of the first pioneers worldwide to have demonstrably worked on the topics of Generative Engine Optimization (GEO) and Large Language Model Optimization (LLMO). His first publications date back to 2023. As an author, Olaf Kopp writes for national and international magazines such as Search Engine Land, t3n, Website Boosting, Hubspot, Sistrix, Oncrawl, Searchmetrics, Upload … . In 2022 he was Top contributor for Search Engine Land. His blog is one of the most famous online marketing blogs in Germany. In addition, Olaf Kopp is a speaker for SEO and content marketing SMX, SERP Conf., CMCx, OMT, OMX, Campixx...

COMMENT ARTICLE



Content from the blog

Brand Context Optimization: A Practical Step-by-Step Guide

This guide helps you systematically optimize how AI systems (LLMs like ChatGPT, Gemini, Perplexity) and read more

Brand Identity Blocks for Brand Context Optimization

In this article, I would like to introduce you to the concept of brand identity read more

What is brand context optimization for GEO?

Brand context optimization is a strategic process of Generative Engine Optimization (GEO) that aims to read more

Brand Context Optimization: How to Write Text About Your Brand (for Companies, Persons and Products)

Search engines and large language models extract structured facts from your text — parsing sentences, read more

Guide to Brand Context Optimization for Generative Engine Optimization (GEO)

In many discussions about generative engine optimization, too little distinction is made between the different read more

Ultimate guide for llm readability optimization and better chunk relevance

In many discussions about generative engine optimization, too little distinction is made between the different read more