Author: Olaf Kopp
Reading time: 8 Minutes

LLMO: How do you optimize for the answers of generative AI systems?

5/5 - (1 vote)

As more and more people prefer to ask ChatGPT rather than Google when searching for products, companies need to ask themselves: how do we show up at the top of the list?

In this article, I shed light on the technical background. It shows that there are still no clear answers to many questions in this area. However, previous methods of branding seem to work here too.

A look at the current shift to generative AI

Applications based on generative AI are taking the world by storm. Especially for researching information and answering questions quickly, services such as ChatGPT, Gemini and others could represent serious competition for search engines such as Google – at least in part.

Basically, we should differentiate between:

  1. Website clicks from the search results (SERPs) or traffic that could be lost,
  2. and a possible reduction in general usage or a reduction in search queries.

According to a study by Gartner, search engine usage will fall by 25% by 2026 in favor of AI chatbots and virtual assistants.

Personally, I don’t believe that we will see such a shift by 2026. Nevertheless, I believe that future generations will increasingly rely on AI chatbots for researching information and products.

So I think 25% and more is quite realistic, but in five to ten years rather than two. It will be a slower but steady development. User habits remain habits!

I see the reduction in search engine traffic on websites coming our way much faster. With the introduction of SGE (now “AI Overviews”), I expect a reduction in search engine traffic of up to 20% on average in the first two years of introduction. Depending on the search intention, it could be more or less. However, I am sure that “no-click searches” will increase because generative AI is already providing the solutions and answers.

This will shorten the time in the research journey and customer journey or messy middle.

To generate awareness in the user journey, it would also be negligent to focus solely on search engine rankings and clicks or website traffic as a result.

If you ask ChatGPT today for a car that should fulfill certain characteristics, they will suggest specific models:

If you ask Gemini the same question, certain car models are also suggested, including pictures:

Interestingly, the example above recommends different car models depending on the application.

ChatGPT:

  • Tesla Model Y
  • Toyota Highlander Hybrid
  • Hyundai Ioniq 5
  • Volvo XC90 Recharge
  • Ford Mustang Mach-E
  • Honda CR-V Hybrid

Gemini:

  • Chrysler Pacifica Hybrid
  • Toyota Sienna
  • Mid-Size SUVs in general
  • Toyota RAV4 Hybrid
  • Row SUVs
  • Toyota Highlander Hybrid

This makes it clear that the underlying language model (Large Language Model, LLM) works differently depending on the AI application.

In future, it will become increasingly important for companies to be named in recommendations such as these in order to be included in a relevant set of possible solutions.

But why exactly are these models proposed by generative AI?

In order to answer this question, we need to understand a little more about the technological functioning of generative AI and LLMs.

Excursus: How LLMs work

Modern transformer-based LLMs such as GPT or Gemini are based on a statistical analysis of the co-occurrence of tokens or words.

For this purpose, texts and data are broken down into tokens for machine processing and positioned in semantic spaces with the help of vectors. Vectors can also be whole words (Word2Vec), entities (Node2Vec) and attributes.

In semantics, the semantic space is also described as an ontology. Since LLMs are based more on statistics than on real semantics, they are not ontologies. However, AI comes closer to semantic understanding due to the highly scalable amount of data that can be processed.

The semantic proximity can be determined by the Euclidean distance or the cosine angle measure in the semantic space:

In this way, the relationship between products and attributes can be established.

LLMs determine the relationship as part of encoding using natural language processing. This allows texts broken down into tokens to be divided into entities and attributes.

The number of co-occurrences of certain tokens increases the probability of a relationship between these tokens.

Language models are initially trained using human-labeled data. This initial training data are crawl databases of the Internet, other databases, books, Wikipedia … The vast majority of the data used to train state-of-the-art LLMs are therefore texts from publicly accessible Internet resources (e.g. the latest “Common Crawl” dataset, which contains data from more than three billion pages).

It is not clear exactly which sources are used for the initial crawl.

In order to reduce hallucinations and make deeper specific subject knowledge accessible to the LLM, modern LLMs are additionally supported with content from domain-specific sources. This process takes place as part of Retrieval Augmented Generation (RAG).

Source: https://gradientflow.com/techniques-challenges-and-future-of-augmented-language-models/

Graph databases such as the Google Knowledge Graph or Shopping Graph can also be used in the context of RAG to develop a better semantic understanding.

LLMO, GEO, GAIO as a new discipline for influencing generative AI

The big challenge for companies will be to play a role not only in the previously known search engines but also in the output of language models. Be it in the form of source references including links or by mentioning their own brand(s) and products.

Influencing the output of generative AI is a previously unexplored field of research. There are several theories and many names such as Large Language Model Optimization (LLMO), Generative Engine Optimization (GEO), Generative AI Optimization (GAIO).

Reliable evidence for optimization approaches from practice is still scarce. This leaves only the derivation from the technological understanding of LLMs.

Establishment as a thematically trustworthy and relevant source for non-commercially driven prompts
In the case of non-commercial prompts, the most important goal is to be named as a source, including a link to your own website.

It would be logical for AI systems with direct access to search engines to refer to the best-ranking content when compiling an answer.

Here is an example of a prompt: “google core update march 2024”

The sources mentioned in Copilot are

  • searchengineland.com
  • coalitiontechnologies.com
  • seroundtable.com

Rankings in the classic results without videos and news for the corresponding search query on Bing is as follows:

  • searchengineland.com
  • blog.google
  • searchenginejournal.com
  • yoast.com
  • developers.google.com
  • semrush.com

Some of the sources show overlaps with the search results, but not all.

ChatGPT shows the following sources at the same prompt:

Google’s Gemini is mentioning following sources:

In addition to the relevance criteria, other quality criteria also appear to be used in the selection of sources, which could presumably be similar to Google’s E-E-A-T concept.

Some studies on Google’s SGE also show a high correlation between well-known brands, such as a study by Peak Ace in the tourism segment and another study by authoritas.

Peak Ace examined which domains in the travel segment are frequently linked out of the SGE:

Authoritas has investigated which domains are generally linked from the SGE:

A connection between brand strength and the selection of sources for SGE can be guessed at.

Digital brand and product positioning of commercially driven prompts

With purchase-oriented prompts, the most important goal is to be directly recommended as a brand or product by the AI in the shopping grids or in the output.

But how can this be achieved?

As always, a sensible approach here is to start with the user or the prompts. As is so often the case, understanding the user and their needs is the basis.

Prompts can provide more context than the few terms of a standard search query:

Companies should therefore aim to position their own brands and products in specific user contexts.

Frequently requested attribute classes on the market and in prompts (e.g. condition, usage, number of users …) can be an initial point of reference for finding out in which contexts brands and products should be positioned.

But where should this positioning take place?

To do this, you need to know which training data an LLM uses. This in turn depends on the LLM in question:

  • If an LLM has access to a search engine, highly ranking content in this search engine could be a possible source.
  • Renowned (industry) directories, (product) databases or other thematically authoritative sources could be places for optimization.
  • Google’s E-E-A-T concept can also play an important role here, at least for Gemini or the SGE, in identifying trustworthy sources as well as trustworthy brands and products.

Conclusion

It remains to be seen whether LLMO or GAIO will really become a legitimate strategy for influencing LLMs with regard to their own goals. On the data science side, there is skepticism. Others believe in this approach.

If this is the case, the following goals need to be achieved:

The chances of success of LLM optimization are directly related to the size of the market: The more niche a market is, the easier it is to position yourself as a brand in the respective thematic context.

This means that fewer co-occurrences in the qualified media are required in order to be associated with the relevant attributes and entities in the LLMs. The larger the market, the more difficult this is, as many market players have large PR and marketing resources and a long history.

GAIO or LLMO requires significantly more resources than classic SEO to influence public perception.

At this point, I would like to refer to my concept of Digital Authority Management. You can read more about this in the article Authority Management: A new discipline in the age of SGE and E-E-A-T.

Let’s assume that LLM optimization proves to be a sensible strategy. In this case, big brands will have significant advantages in search engine positioning and generative AI results in the future due to their PR and marketing resources.

Another perspective is that search engine optimization can be continued as before, as well-ranked content is simultaneously used to train LLMs. You should also pay attention to co-occurrences between brands/products and attributes or other entities and optimize for them.

Which of these approaches will be the future for SEO is unclear and will only become clear when SGE is finally introduced.

About Olaf Kopp

Olaf Kopp is Co-Founder, Chief Business Development Officer (CBDO) and Head of SEO & Content at Aufgesang GmbH. He is an internationally recognized industry expert in semantic SEO, E-E-A-T, modern search engine technology, content marketing and customer journey management. As an author, Olaf Kopp writes for national and international magazines such as Search Engine Land, t3n, Website Boosting, Hubspot, Sistrix, Oncrawl, Searchmetrics, Upload … . In 2022 he was Top contributor for Search Engine Land. His blog is one of the most famous online marketing blogs in Germany. In addition, Olaf Kopp is a speaker for SEO and content marketing SMX, CMCx, OMT, OMX, Campixx...

COMMENT ARTICLE



Content from the blog

The most important ranking methods for modern search engines

Modern search engines can rank search results in different ways. Vector Ranking, BM25, and Semantic read more

Digital brand building: The interplay of (online) branding & customer experience

Digital brand building or branding is one of the central topics in online marketing. Read read more

How to become a really good SEO

I’ve been doing SEO for 15+ years now and it’s been a long road of read more

Helpful content: What Google really evaluates?

Since the first Helpful Content Update in 2022, the SEO world has been thinking about read more

Interesting Google patents & research papers for search and SEO in 2024

In this article I would like to contribute to archiving well-founded knowledge from Google patents read more

Information gain score: How it is calculated? Which factors are crucial?

Information gain is one of the most exciting ranking factors for modern search engines and read more