Google’s journey to a semantic search engine
In this post I would like to discuss what steps and innovations have brought Google closer to the goal of semantic understanding in search since 2010. The article is of interest to SEOs, educators, and journalists who want to learn about search engine technology and evolution of search.
What is semantic search?
A semantic search engine takes into account the context of a search query to better understand the meaning of the search term. In contrast to purely keyword-based search systems, semantic search aims to better interpret the meaning or search intent of the search query and of the meaning of documents. While keyword based search engines work on the basis of keyword-text matching, semantic search engines also consider the relationships between entities for outputting search results.
Why is semantic search so desirable for Google?
There is a common thread running through Google’s major launches regarding search that reveals a clear goal. The complete understanding of all content on the web and unambiguous interpretation of search queries. In order to understand the true meaning of search queries and interpretation of content, Google must
- Clearly identify search queries and their search intent
- Identify entities in search queries and content and organize the index around them
- Identify entities with high expertise, authority and trust (E-A-T)
To achieve these goals, Google needs powerful systems and algorithms that tap into and interpret the complete knowledge of the world of.
Google’s steps and innovations towards a semantic search engine
Google has consistently pursued this goal since the purchase of the semantic database Freebase, the introduction of the Knowledge Graph as a semantic database in 2012, and the launch of the latest ranking algorithm base Hummingbird in 2013. Tapping into the world’s knowledge requires an index built as a graph. While prior to 2012, Google accessed an index similar to a tabular database where information was stored similar to a directory, a graph index can capture and map relationships between information and entities.
Here is a list of the most significant innovations Google has introduced since 2010 on its way to becoming a semantic search engine:
- 2010: Google buys Freebase, a semantic database of structured machine-readable entity data created by Metaweb. The first version of the Knowledge Graph was fed by data from Freebase. In 2014, Freebase was transferred to the Wikidata project. However, of the original 10 million or so records from Freebase, only a portion was transferred. My own record for the entity “olaf kopp”, which I created in Freebase in 2012, was not transferred to Wikidata. I have maintained it manually there. Nevertheless, my data from the former Freebase database is still output in the form of a knowledge panel and extended by further information.
- 2012: Google introduces the Knowledge Graph in the form of Knowledge Panels and Knowledge Cards in search. A knowledge graph is a knowledge database in which information is structured in such a way that knowledge is created from the information. In a Knowledge Graph, entities (nodes) are related to each other via edges, provided with attributes and brought into a thematic context or ontology. More about this below in this article or here >>> Google Knowledge Graph simply explained.
- 2013: Google introduces the Hummingbird update as a new generation of ranking algorithms. The introduction of Hummingbird on Google’s 15th birthday in 2013 was the final launch of semantic search for Google. Google itself has called this algorithm update the most significant since the Caffeine update in 2010. It was said to have affected about 90% of all search queries at launch and was a true algorithm update compared to Caffeine. It is supposed to help to interpret more complex search queries better and to recognize even better the actual sarch intention or question behind a search query as well as to offer matching documents. Also on document level the actual intention behind the content should be better matched with the search query.
- 2014: Google introduces the Knowledge Vault. A system for identifying and extracting tail entities to drive the expansion of the “long tail of knowledge”. Through the Knowledge Vault, Google is able to automate data mining from unstructured sources and could be the foundation for subsequent innovations in Natural Language Processing.
- 2014: Google introduces E-A-T for rating websites in the Quality Rater Guidelines. At first glance, the bridge to semantic search is impossible to draw. Indirectly, however, the entity concept and graph construct of semantic databases provides an ideal basis for a topic-related qualitative evaluation of entities (publishers & authors) and their content in terms of expertise, authority and trust. An entity based index makes it possible to look at entities like authors, publishers, brands, domains … holistically. This is not possible if you only look at single URLs, images … as the classic Google indexes do.
- 2015: Google officially introduces Machine Learning into Google Search with Rankbrain. Via vector space analyses, the search engine wants to better locate search queries and generally terms in a relationship, thematic proximity or context. Among other things, this will allow search queries to be better interpreted in terms of search intent.
- 2018: Google introduces BERT as a new technology for better interpretation of search queries and text. BERT uses Natural Language Processing to better semantically understand search queries, sentences, questions, text segments and content in general.
- 2021: Google introduces MUM as a new technology for better semantic understanding of search queries, questions, content in various forms (text, video, audio, image) and for tapping the “knowledge of the world”. With MUM, Google can add entity information to semanticdatabase(s) like the Knowledge Graph even faster and more extensively. More in my article on search engine land Google MUM update: What can SEOs expect in the future?
Infographic “Google’s way to a semantic search engine” for downloading
Below is an infographic to explains Google’s way to a semantic search engine for free use for e.g. social media.
- All you should know as an SEO about entity types, classes & attributes - 6. August 2022
- What are Micro Intents? - 8. July 2022
- How does Google understands search terms by search query processing? - 29. June 2022
- A bit more than an introduction to E-A-T (Expertise, Authority, Trust) - 1. June 2022
- Knowledge Panels & SERPs for ambiguous search queries - 22. May 2022
- Evolution of Marketing: From Advertising to Content – From Push to Pull - 16. May 2022
- How Google can identify and interpret entities from unstructured content? - 2. May 2022
- How does Google process information from Wikipedia for the Knowledge Graph? - 13. April 2022
- How can Google identify and rank relevant documents via entities, NLP & vector space analysis? - 13. April 2022
- Insights from the Whitepaper “How Google fights misinformation” on E-A-T and Ranking - 10. April 2022