Helpful content: What Google really evaluates?
Since the first Helpful Content Update in 2022, the SEO world has been thinking about how to create helpful content or optimize it accordingly. Hypotheses are put forward, analyses, checklists and audits are created. I don’t find most of these approaches useful because they are derived from the perspective of a human and not a machine or algorithm.
Google is a machine, not a human!
My SEO mantra is: “Think like an engineer, act like a human.”
The focus here is often on the nature of the content. But does Google really evaluate content according to helpfullness?
With this article I would like to invite you to a discussion.
Contents
- 1 Helpful content, what is it anyway?
- 2 What is helpful?
- 3 How can you algorithmically measure helpfulness, pertinence and usefulness?
- 4 Identification of helpful document properties based on user signals
- 5 The interaction between initial ranking and reranking
- 6 Helpful content has a correlation with content, but is causal to user signals
Helpful content, what is it anyway?
Helpful Content is a terminology that Google introduced as part of the first Hepful Content update in August 2022. Google initially announced that the Helpful Content System was a “sitewide classifier”. It was later announced that it would also be used to rate individual documents.
Our helpful content system is designed to better ensure people see original, helpful content written by people, for people, in search results, rather than content made primarily to gain search engine traffic.
Our central ranking systems are primarily designed for use at page level. Various signals and systems are used to determine the usefulness of individual pages. There are also some website-wide signals that are also taken into account.
I have already commented on the first Helpful Content Update that this update was primarily a PR update, and not just because of the meaningful title. You can read my reasoning and criticism in detail here.
One of Google’s PR goals is to encourage website operators to make crawling, indexing and therefore ranking easier. At least that was the aim of the biggest updates such as the changeover to Page Speed Update, Page Experience Update, Spam Update … These updates have one thing in common. They imply a recommendation for action through the meaningful concrete title and thus help Google with information retrieval.
I would have prefer to call the Helpful Content System a “User Satisfaction System”. But more on that later.
What is helpful?
In order to answer this question, you should take a closer look at the information retrieval terms relevance, pertinence and usefulness. As described in my article “Relevance, pertinence and quality in search engines“, these terms are described as follows:
Something is relevant to search engines if a document or content is significant in relation to the search query. The search query describes the situation and the context. Google determines this relevance using text analysis methods such as BM25 or TF-IDF, Word2Vec …
Pertinence describes the subjective importance of a document for the user. This means that in addition to the match with the search query, a subjective user level is added.
In addition to the conditions for relevance and pertinence,usefulness also restricts the level of novelty.
For me, pertinence and usefulness are the two levels that stand for helpfulness.
How can you algorithmically measure helpfulness, pertinence and usefulness?
Pertinence and usefulness can be determined by user satisfaction with the content. The best way to determine user satisfaction is to measure and interpret user behavior. In addition to the relevance of the content to the search query, this provides a better indication of whether users really find the content helpful in the respective context. The analysis of documents or content properties only provides limited information about how helpful a content search result is, as the user is not taken into account here.
There are various possible metrics for this, which emerge from the Google API leak:
- CTR (click-through rate)
ctrWeightedImpressions
: This attribute records the weighted impressions for the calculation of the CTR.
-
- Good clicks
goodClicks
: This attribute tracks the number of good clicks.
lastGoodClickDateInDays
: Shows the date on which the document received the last good click.
- Source: GoogleApi.ContentWarehouse.V1.Model.QualityNavboostCrapsCrapsClickSignals
- Bad clicks
badClicks
: This attribute records the number of bad clicks.
- Source: GoogleApi.ContentWarehouse.V1.Model.QualityNavboostCrapsCrapsClickSignals
- Long clicks
LastLongestClicks
: This attribute tracks the number of clicks that were the last and longest in related user queries.
- Source: GoogleApi.ContentWarehouse.V1.Model.QualityNavboostCrapsCrapsClickSignals
- Short clicks
- While there is no direct attribute called “short clicks”, the absence of long clicks or a high number of bad clicks could indicate shorter interactions.
- Source: GoogleApi.ContentWarehouse.V1.Model.QualityNavboostCrapsCrapsClickSignals
- While there is no direct attribute called “short clicks”, the absence of long clicks or a high number of bad clicks could indicate shorter interactions.
Source: Google API Leak Analyzer
Other factors that I have researched from Google patents are
- Click-Through-Rate (CTR):
- Search result interaction: the percentage of users who click on a website link when it appears in search results.
- Ad performance: CTR for the ads displayed on the website.
- Search result interaction: the percentage of users who click on a website link when it appears in search results.
- Dwell time:
- Average time spent on the website: The average time users spend on the site after clicking on a search result.
- Bounce rate: The percentage of visitors who leave the website after viewing only one page.
- Average time spent on the website: The average time users spend on the site after clicking on a search result.
- Good clicks and bad clicks:
- User engagement metrics: Metrics such as page interactions (likes, shares, comments), bounce rates and revisits.
- View duration: Longer views have a higher relevance, which indicates good clicks, while shorter views have a lower relevance, which indicates bad clicks.
- User engagement metrics: Metrics such as page interactions (likes, shares, comments), bounce rates and revisits.
- Long clicks and short clicks:
- View duration: measures the time users spend viewing a document. Longer views (long clicks) are considered more relevant.
- Weighting functions: Applies continuous and discontinuous weighting functions to adjust relevance scores based on viewing duration.
- View duration: measures the time users spend viewing a document. Longer views (long clicks) are considered more relevant.
Patents:
-
-
- “Ranking factors or scoring criteria”
- “Increased importance of metrics for user engagement”
- “User engagement as a ranking factor”
- “Ranking factors or scoring criteria”
-
Source: Database Research Assistant
Usefulness can also be determined by search engines using an information gain score.
The information gain or information gain refers to a score that indicates how much additional information a document contains over and above the usual information contained in the documents previously viewed by a user.
This score helps determine how much new information a document offers the user compared to what the user has already seen.
You can find out more about information gain in the article Information gain: How is it calculated? Which factors are crucial?
Identification of helpful document properties based on user signals
Another possibility is to identify supposed document properties or document patterns that could be helpful for users via positive user signals in statistically valid quantities.
The Google patent “Ranking Search Result Documents” describes a method that compares the properties of search queries with document properties based on past user interactions.
However, this method would require a lot of computer resources. In addition, such a methodology would always result in a greater time delay until the results are meaningful.
The interaction between initial ranking and reranking
In order to understand at which point in the ranking process helpful content is determined, a brief digression into parts of the information retrieval process is required.
There are three steps in the ranking process:
- Document evaluation
- Classification of quality
- Re-ranking
- Document scoring is responsible for the initial ranking of the top n documents. An ascorer is used here to calculate IR scores. How high the n is can only be guessed. For performance reasons, I assume a maximum of a few hundred documents.
- Signals relating to E-E-A-T play a particularly important role in quality classification. Here, the quality of the individual documents is not evaluated, but page-wide classifiers are used.
- Twiddlers are used for reranking.
Twiddlers are components within Google’s Superroot system that are used to re-evaluate search results from a single corpus. They work with ranked sequences rather than isolated results and make adjustments to the original ranking created by Ascorer. There are two types of twiddlers: Predoc and Lazy.
-
-
- PredocTwiddler:
- Operation: they work with thin answers (initial search results with minimal information).
- Functions: Changing IR scores, reordering results and making remote procedure calls (RPCs).
- Use case: Suitable for comprehensive, initial adjustments and promoting results based on preliminary data.
- Operation: they work with thin answers (initial search results with minimal information).
- Lazy tinkerers:
- Process: executing bold results (detailed document information).
- Functions: Reorganizing and filtering results based on detailed content analysis.
- Use case: Ideal for fine-tuning and filtering based on specific content attributes.
- Process: executing bold results (detailed document information).
- PredocTwiddler:
-
More detailed information can be found in the “Twiddler Quick Start Guide”, which you can download here.
Source: Database Research Assistant
According to the API leak, these Twiddlers can also be used for evaluation at domain level in addition to the document level.
Twiddlers are used in Google’s ranking and indexing processes to adjust the relevance and ranking of documents. These are essentially factors or signals that can be “twiddled” or adjusted to fine-tune search results. Here are some key points about Twiddler based on the documents provided:
-
- Classification of domains:
- Twiddlers can be used to classify the domain of a document, which helps to understand the context and relevance of the content.
- Source: “qualityTwiddlerDomainClassification” – Google-Leak_API-Module_summarized
- Twiddlers can be used to classify the domain of a document, which helps to understand the context and relevance of the content.
- Spam detection:
- Twiddlers play a role in spam detection and mitigation. They can adjust the ranking of documents that are flagged by spam detection algorithms.
- Source: “spamBrainSpamBrainData” – Google-Leak_API-Module_summarized
- Twiddlers play a role in spam detection and mitigation. They can adjust the ranking of documents that are flagged by spam detection algorithms.
- Content quality:
- Twiddlers can influence the perceived quality of content by adjusting scores based on various quality signals.
- Source: “commonsenseScoredCompoundReferenceAnnotation” – Google-Leak_API-Module_summarized
- Twiddlers can influence the perceived quality of content by adjusting scores based on various quality signals.
- Shopping and ads:
- For e-commerce and shopping-related search queries, users can customize the relevance of shopping annotations and ads.
- Source: “adsShoppingWebpxRawShoppingAnnotation” – Google-Leak_API-Module_summarized
- For e-commerce and shopping-related search queries, users can customize the relevance of shopping annotations and ads.
- Classification of domains:
Source: Google API Leak Analyzer
The Twiddlers are part of Google’s Superroot and are responsible for a downstream quality assessment in terms of, among other things, helpfullness at a document and domain level.
Objective ranking factors, with the exception of information gain, make no sense for the evaluation of helpful content, as they do not focus on the user. These factors are primarily taken into account in the initial ranking via the Ascorer.
It makes sense that Google evaluates Helpful Content primarily on the basis of the various possible user signals and an information gain score, which can also be evaluated individually by very personalized users.
Helpful content has a correlation with content, but is causal to user signals
As mentioned at the beginning, I am skeptical about many analyses and checklists regarding helpful content because I think that Google evaluates helpfullness primarily on the basis of user signals and not on the basis of document properties. In other words, I think that analyzing individual content in terms of helpfullness without having insight into user data is only of limited value.
Of course you want to improve user signals by optimizing content, but in the end it is the user who decides whether he/she finds a piece of content helpful or not and not the SEO who optimizes certain properties of a document according to a checklist.
In addition, the user’s decision as to whether he/she finds a piece of content helpful depends on the topic and context. In other words, the recommendations for optimization are also always dependent on this.
There may be correlations between document properties and helpful content, but in the end there is causality to the user signals.
In other words: If you optimize a piece of content and the user signals do not improve, it will not become more helpful. Google must first learn what is helpful based on the user signals.
This thesis is underpinned by the findings from the antitrust proceedings against Google. According to this, the understanding/quality of content can only be derived to a limited extent from the document itself.
The desire for a blueprint, preferably in the form of checklists, is great in the SEO industry. That’s why they always get a lot of attention and are popular. However, they lag behind the times, as the need and therefore the helpfulness of content can be very dynamic for each search query.
There is also a great desire for clarity, e.g. regarding Google updates and possible reasons for a penalty. This is why analyses of Google updates are also very popular.
But if content is king, user signals are queen and they ultimately determine how helpful a piece of content is rated by Google. Since most analyses of core updates and helpful content are based on the characteristics of documents and domains, they represent correlations at most, but not causalities.
A theory such as Google devaluing websites because of affiliate links or because they do not mention the right entities or keywords does not make sense. Google devalues websites because the user signals are not appropriate and they do not offer any information gain, so they do not meet user needs and are therefore not helpful for many users. Google does not devalue pages in the re-ranking because of certain document properties.
For me, the Helpful Content System is more of a framework that summarizes all the user signals used and the rating systems based on them. So I would call it “User satisfactions system”.
What is your opinion? Let’s discuss!
- Case Study: 1400% visibility increase in 6 months through E-E-A-T of the source entity - 24. September 2024
- The most important ranking methods for modern search engines - 2. September 2024
- Digital brand building: The interplay of (online) branding & customer experience - 20. August 2024
- How to become a really good SEO - 12. August 2024
- Helpful content: What Google really evaluates? - 13. July 2024
- Interesting Google patents & research papers for search and SEO in 2024 - 9. July 2024
- Information gain score: How it is calculated? Which factors are crucial? - 6. July 2024
- Google API Leak: Ranking factors and systems - 30. June 2024
- What is BM25? - 14. June 2024
- LLMO: How do you optimize for the answers of generative AI systems? - 10. June 2024
Simon
22.07.2024, 02:14 Uhr
Olaaf, thank you for another informative article. So just to be clear, is your view that AI writers that analyze entities contained in the top end results and seek to add these to an article are just a waste of time?
Another question: is there a place for a tool that measures user interaction on the page and comes up with some sort of helpfulness metric to guide owners as to the helpfulness of content?
Olaf Kopp
22.07.2024, 08:06 Uhr
Hi Simon, no. You have to differentiate between the different steps of ranking and ranking systems. The helpful content system is one of them and part of re-ranking. In my opinion helpful content is a quality classifier, that is activated in the re-ranking process. The initial ranking happens in the ascorer or scoring process and here content based relevance signals are important.
Lee Stuart
23.08.2024, 05:10 Uhr
Olaff thanks for the interesting and reasoned view. I was wondering what your view is on this now that in the latest core update it appears that some sites previously heavily impacted by HCU have recovered. The point of contention is that those user signals have been next to zero for a long time for some of these. So do you think that G is using historical data beyond its normal look back period or is their another re-ranking component added or perhaps some kind of manual intervention ? Interested to hear your thoughts.
Olaf Kopp
23.08.2024, 08:22 Uhr
Hi Lee, good question. The Helpful Content System is only one part of the Ranking Core. Other systems and concepts are e.g. E-E-A-T. Adjustments to the search intents can also have an influence. I think you can find at least as many examples of websites that have not recovered. This is the problem with the analysis of core updates. You will never get a complete overview, but only focus on examples that support your theses. You are subject to confirmation bias.