Author: Olaf Kopp , 04.January 2023
Reading time: 7 Minutes

How does Google search (ranking) may be working today

5/5 - (6 votes)

Google has disclosed information about its ranking systems. With this information, my own thoughts and research, e.g. in Google patents, I want to put the pieces of the puzzle together in this article to form an overall picture.

I do not go into ranking factors in detail and their weighting, more into functionallity.

Disclaimer: Some assumptions in this post are based on my own thoughts and assumptions, all developed from various sources.

Why should SEOs concern themselves with how search engines / Google work?

I don’t think it makes sense to only deal with ranking factors and possible optimization tasks without understanding how a modern search engine like Google works. There are many myths and speculations in the SEO industry that are blindly followed unless you have your own ranking experience. In order to assess myths in advance, it helps to deal with the basic functioning of Google. This post should help you with that.

Process steps for information retrieval, ranking and knowledge discovery at Google

According to the explanations in the excellent lecture “How Google works: A Google Ranking Engineer’s Story” by Paul Haahr, Google distinguishes between the following process steps:

  • Before a query:
    • Crawling
    • Analyzing crawled pages
      • Extract links
      • Render contents
      • Annotate semantics
    • Build an index
  • Query Processing
    • Query understanding
    • Retrieval and scoring
    • post retrieval adjustments

Indexing and Crawling

Indexing and crawling is the basic requirement for ranking, but otherwise has nothing to do with ranking content.

Google crawls the internet via bots every second. These bots are also called crawlers. The Google bots follow links to find new documents/content. But URLs that are not shown in the html code and perhaps! URLs entered directly in the Chrome browser can also be used by Google for crawling.

If the Google Bot finds new links, these are collected in a scheduler so that they can be processed later.

Domains are crawled with varying frequency and completeness, or different crawling budgets are allocated to domains. PageRank used to be an indication of the crawl intensity attributed to a domain. In addition to external links, other factors can also include publishing frequency and update frequency as well as the type of website. News pages that take place on Google News are usually crawled more frequently. According to Google, there are no problems with crawling budgets up to around 10,000 URLs. In other words, most websites have no problem being fully crawled.

Indexing takes place in two stages.

  1. In the first step, the pure html code is first processed with a parser in such a way that it can be transferred to an index in a resource-saving manner. In other words, the first indexed version of content is  a pure html not rendered site. This saves Google time when crawling and thus also when indexing.
  2. In a second later step, the indexed html version is rendered, i.e. displayed like this. how the user sees it in a browser.

If Google has general problems with the indexing and crawling systems, you can monitor them in the Official Google Search Status Dashboard see.

Which Google indexes are there?

With Google, a basic distinction can be made between two types of index.

  1. The classic search index contains all content that Google can index. Depending on the type of content, Google also differentiates between the so-called vertical indices such as classic document index (text), image index, video index, flights, books, news, shopping, finance. The classic search index consists of thousands of shards containing millions of websites. Due to the size of the index, it is possible to compile the top n documents/content per shard very quickly due to the parallel queries of the websites in the individual shards.
  2. The Knowledge Graph is Google’s semantic entity index. All information about entities and their relationships to each other is recorded in the Knowledge Graph. Google obtains information about the entities from various sources.

Using natural language processing, Google is increasingly able to extract unstructured information from search queries and online content in order to identify entities or assign data to entities. With MUM, Google can not only use text sources for this, but also images, videos and audios.

Example Entity Structure in a Knowledge Graph

For data mining Google can use both a query processor and a kind of entity processor or semantic entity API between the classic search index. (see also the Google patent “Search Result Ranking and Representation”)

Search Query Processing

The magic of interpreting search terms happens in search query processing. The following steps are important here:

  1. Identification of the thematic ontology in which the search query moves. If the thematic context is clear, Google can select a content corpus of text documents, videos, images … as potentially suitable search results. This is particularly difficult with ambiguous search terms. More on that in my post KNOWLEDGE PANELS & SERPS FOR AMBIGUOUS SEARCH QUERIES.
  2. Identification of entities and their meaning in the search term (named entity recognition)
  3. Semantic annotation of the search query
  4. Refinement of the search term
  5. Understanding the semantic meaning of a search query.
  6. Identification of the search intention

I deliberately differentiated between 2nd and 3rd here, since the search intent can vary depending on the user and can even change over time, while the lexical semantic meaning remains the same.

For certain search queries such as obvious misspellings or synonyms, a query refinement takes place automatically in the background. As a user, you can also trigger the refinement of the search query manually, insofar as Google is not sure whether it is a typo. With query refinement, a search query is rewritten in the background in order to be able to better interpret the meaning.

In addition to query refinement, query processing also involves query parsing, which enables the search engine to better understand the search query. Search queries are rewritten in such a way that search results can also be delivered that do not directly match the search query itself, but also related search queries. More on this here.

Search query processing can be performed according to the classical keyword-based term x document matching or according to an entity-based approach, depending on whether entities occur in the search query and are already recorded or not.

You can find a detailed description of Search Query Processing in the article How does Google understand search queries through Search Query Processing?

Google ranking systems

Google makes a difference here between the following ranking systems:

  • AI ranking systems
    • Rankbrain
    • BERT
    • MUM
  • Crisis information systems
  • Deduplication systems
  • Exact match domain system
  • Freshness system
  • Helpful content system
  • Link analysis systems and PageRank
  • Local news systems
  • Neural matching
  • Original content systems
  • Removal-based demotion systems
  • Page experience system
  • Passage Ranking system
  • Product review system
  • Reliable information system
  • Site diversity system
  • Spam detection system
  • Retired Systems
    • Hummingbird (has been further developed)
    • Mobile friendly ranking system (now part of the Page experience system)
    • Page speed system (now part of the Page experience system)
    • Panda system (part of the core system since 2015)
    • Penguin System (part of the Core System since 2016)
    • Secure sites system (now part of the Pages experience system)

These ranking systems are used in various process steps of the Google search.

How do the different ranking systems work together?

Finally, I try to bring the large amount of information from Google about the functionality of their search engines into an overall picture.

For the interpretation of search queries, identification of search intention, query refinement, query parsing and search term document matching is a Query Processor responsible.

Of the Entity-Processor or Semantic API forms the interface between the Knowledge Graph and the classic search index. This can be used for named entity recognition and data mining for the knowledge graph or knowledge vault, e.g. via natural language processing. More on that in the post “Natural Language Processing to build a semantic database”.

For the Google ranking is the Scoring Engine, a Entity- und Sitewide Qualifier and a Ranking Engine responsible. When it comes to ranking factors, Google distinguishes between search query-dependent (e.g. keywords, proximity, synonyms…) and search query-independent (e.g. PageRank, language, page experience…) ranking factors. I would still differentiate between document-related ranking factors and domain or entity-related ranking factors.

In the Scoring Engine a relevance assessment takes place at the document level in relation to the search query. At theEntity- und Sitewide Qualifier it is about the evaluation of the publisher and/or author as well as the quality of the content as a whole in relation to themes and UX of the website (areas).

The Ranking Engine brings together the score from the scoring engine and the entity and sitewide qualifier and ranks the search results.

A Cleaning Engine sorts out duplicate content and cleans search results from content that has received a penalty.

A Personalization Layer finally, factors such as the search history or, in the case of regional search intentions, the location or other local ranking factors are taken into account.

Does that sound logical? If so, I’m happy if you share the knowledge.

More posts on how Google works

Not enough? I have been working intensively with books, Google sources and Google patents on modern search engine technologies since 2014. Here is a selection of articles I have written about it:

Zu Olaf Kopp

Olaf Kopp is Co-Founder, Chief Business Development Officer (CBDO) and Head of SEO at Aufgesang GmbH. His work focuses on the topics of digital brand building, E-E-A-T, semantic SEO, content marketing, online marketing strategies along the customer journey. He is the author of the book "Content Marketing along the Customer Journey", co-organizer of the SEAcamp and host of the german podcasts Content-Kompass and OM Cafe. As an enthusiastic search engine and content marketer, he writes for various specialist magazines, including Search Engine Land, t3n, Website Boosting, Hubspot ... His blog is one of the most famous online marketing blogs in Germany. In addition, Olaf Kopp is a speaker for SEO and content marketing at Hanover University of Applied Sciences, SMX, CMCx, OMT, OMX, Campixx...

COMMENT ARTICLE



Content from the blog

Relevance, pertinence and quality in search engines

To the basics in Information Retrieval, so the science behind search engines , belong the read more

How does Google search (ranking) may be working today

Google has disclosed information about its ranking systems. With this information, my own thoughts and read more

Most interesting Google Patents for SEO in 2022

In this article I would also like to contribute to archiving well-founded knowledge from Google read more

A bit more than an introduction to E-E-A-T (Experience, Expertise, Authority, Trust)

There are many definitions and explanations of E-E-A-T, but few are truly tangible. This article read more

The role of successful SEO: Consultant, interface and enabler

The SEO world has changed a lot in the last 10 years and with it read more

All you should know as an SEO about entity types, classes & attributes

There is little information about the important elements of the Google Knowledge Graph such as read more