The most interesting Google patents and scientific papers on E-E-A-T
E-E-A-T has become one of the most important ranking influences for Google search results since 2018 due to the Core updates and will gain additional importance with the introduction of SGE. In this post, I’d like to give a list and introduction to the most interest Google patents and papers regarding E-E-A-T. Please share the knowledge!
- 1 Producing a ranking for pages using distances in a web-link graph
- 2 Combating Web Spam with Trust Rank
- 3 Search result ranking based on trust
- 4 Credibility of an author of online content
- 5 Sentiment detection as a ranking signal for reviewable entities
- 6 Systems and Methods for Re-Ranking ranked Search Results
- 7 Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources
- 8 Website Representation Vector
- 9 Generating author vectors
- 10 How Google fights Disinformation
- 11 Search Quality Evaluator Guidelines
- 12 Google documentation on Panda
- 13 Overview: Possible factors influencing E-E-A-T
This Google patent was resigned by Google in 2017 in the latest version and the status is active. The patent describes how a ranking score for linked documents can be produced based on the proximity to manually selected seed sites. In the process, the seed sites themselves are individually weighted.
In a variation on this embodiment, a seed page si in the set of seed pages is associated with a predetermined weight wherein 0<wi≦1. Furthermore, the seed page si is associated with an initial distance di wherein di=−log(wi).
The seed pages themselves are of high quality or the sources have a high credibility. You can read the following about these pages in the patent:
In one embodiment of the present invention, seeds 102 are specially selected high-quality pages which provide good web connectivity to other non-seed pages. More specifically, to ensure that other high-quality pages are easily reachable from seeds 102, seeds in seeds 102 need to be reliable, diverse to cover a wide range of fields of public interests, as well as well-connected with other pages (i.e., having a large number of outgoing links). For example, Google Directory and The New York Times are both good seeds which possess such properties. It is typically assumed that these seeds are also “closer” to other high-quality pages on the web. In addition, seeds with large number of useful outgoing links facilitate identifying other useful and high-quality pages, thereby acting as “hubs” on the web.
According to the patent, these seed pages must be selected manually and the number should be limited to prevent manipulation. The length of a link between a seed page and the document to be ranked can be determined e.g. by the following criteria:
- the position of the link
- the font of the link
- degree of thematic deviation of the source page
- number of outgoing links of the source page
It is interesting to note that pages that do not have a direct or indirect link to at least one seed page are not even included in the scoring.
This also allows conclusions to be drawn as to why some links are included by Google for ranking and some are not.
Note that however, not all the pages in the set of pages receive ranking scores through this process. For example, a page that cannot be reached by any of the seed pages will not be ranked.
This concept can be applied to the document itself, but also to the publisher, domain or author in general. A publisher or author that is often directly referenced by seed sites gets a higher authority for the topic and semantically related keywords from which it is linked. These seed sites can be a set of sites per topic that are either manually determined or reach a threshold of authority and trust signals.
The scientific paper “Combating Web Spam with Trust Rank” describes how further trustworthy seed sites can be identified automatically on the basis of a manual selection of a set of a maximum of 200 seed sites.
Our results show that we can effectively filter out spam from a significant fraction of the web, based on a good seed set of less than 200 sites.
A human expert then examines the seed pages, and tells the algorithm if they are spam (bad pages) or not (good pages). Finally, the algorithm identifies other
pages that are likely to be good based on their connectivity with the good seed pages.
The algorithmic determination of further trusted sites follows the assumption that trusted sites do not link to spam sites on their own, but to other trusted sources.
In the Google patent Search result ranking based on trust there are references to the use of anchor texts as a trust score.
The patent describes how the ranking scoring of documents is supplemented based on a trust label. This information can be from the document itself or from referring third-party documents in the form of link text or other information related to the document or entity. These labels are associated with the URL and recorded in an annotation database.
Google itself has also confirmed that the anchor texts of links not only increase the relevance of the target page itself, but can have a positive effect on the entire domain.
For me this Google patent is the most interesting one according to E-E-A-T. In Credibility of an author of online content, reference is made to several factors that can be used to algorithmically determine the credibility of an author. This Google patent has the status “Application status is active”.
It describes how a search engine can rank documents under the influence of a credibility factor and reputation score of the author.
- An author can have several reputation scores, depending on how many different topics he publishes content on. That is, an author can have reputation for multiple topics.
- The reputation score of an author is independent of the publisher.
- The reputation score can be downgraded if duplicates of content or excerpts are published multiple times.
In this patent there is again a reference to links as a possible factor for an E-E-A-T rating. So the reputation score of an author can be influenced by the number of links of the published content.
The following possible signals for a reputation score are mentioned:
- How long the author has a proven track record of producing content in a topic area.
- how well known the author is
- Ratings of the published content by users
- If content of the author is published by another publisher with above-average ratings
- The number of content published by the author
- How long it has been since the author’s last publication
- The ratings of previous publications of similar topic by the author
More interesting information about reputation score from the patent:
- An author can have multiple reputation scores depending on how many different topics they publish content on.
- The reputation score of an author is independent of the publisher.
- Reputation score can be downgraded if duplicate content or excerpts are published multiple times.
- The reputation score can be influenced by the number of links of the published content.
Furthermore, the patent discusses a creditability factor for authors. For this, verified information about the profession or the role of the author in a company is relevant. The relevance of the profession to the topics of the published content is also decisive for the credibility of the author. The level of education and training of the author can also have a bearing here.
The verified information about the author can include the number of other publications of the author that are relevant to the author’s online content item. The verified information about the author can include the number of citations to the author’s online content item that are made in other publications of one or more different authors. The verified information about the author can include information about awards and recognition of the author in one or more fields. The credibility factor can be further based on the relevancy of the one or more fields to the author’s online content item. The verified information about the author can include feedback received about the author or the author’s online content item from one or more organizations. The credibility factor can be further based on the relevancy of the one or more organizations to the author’s online content item and the feedback received. The verified information about the author can include revenue information about the author’s online content item.
Other factors mentioned are:
- Experience of the author due to time: The longer an author has already published on a topic, the more credible he is. Google can algorithmically determine the experience of the author/publisher via the date of the first publication in a topic field.
- Number of content published on a topic: If an author publishes many articles on a topic, it can be assumed that he is an expert and has a certain credibility. If the author is known to Google as an entity, it is possible to record all content published by him in an entity index such as the Knowledge Graph or Knowledge Vault and assign it to a topic field. This can be used to determine the number of contents per topic field.
- Time elapsed until last publication: The longer it has been since an author’s last publication on a topic field, the more a possible reputation score for this topic field decreases. The more recent the content, the higher the score.
- Mentions of the author / publisher in award and best-of lists: If the author has received awards or other forms of public recognition in the topic area of the online content item or for the online content item itself, the author’s credibility factor can be positively influenced.
If the author’s online content item is published by a publisher that regularly publishes works of authors who have received awards or other public recognition, thereby increasing the credibility of the publisher itself, the author’s credibility score can be influenced.
Furthermore, mentions in best-seller lists can have an influence on the credibility measurement.
The level of success of the author, either in relation to a particular online content item, or generally, can be measured to some degree by the success of the author’s published works, for example, whether one or more have reached best seller lists or by revenue generated from one or more publications. If this information is available and indicated relative success of the author in a particular field, this can positively influence the author’s credibility factor.
- Name recognition of the author/publisher: The higher the level of awareness of an author/publisher, the more credible he is and the higher his authority in a subject area. Google can algorithmically measure the level of awareness via the number of mentions and the search volume for the name. In addition to the patent already mentioned, there are further statements from Google on the degree of awareness as a possible ranking factor.
The Google patent Sentiment detection as a ranking signal for reviewable entities describes how sentiment analysis can be used to identify sentiments around reviewable entities in documents. The results can then be used for ranking entities and related documents.
Evaluable entities include people, places, or things about which sentiment can be expressed, such as restaurants, hotels, consumer products such as electronics, movies, books, and live performances.
Structured unstructured data can be used as a source. Structured reviews are collected from popular review websites such as Google Maps, TripAdvisor, Citysearch, or Yelp.
The entities stored in the Sentiment database are represented by tuples in the form of the entity ID, entity type and one or more reviews. The reviews are assigned different scores, which are calculated in the Ranking Analysis Engine.
In the Ranking Analysis Engine, sentiment scores concerning the respective reviews including additional information such as the author are determined.
This patent also discusses the use of interaction signals to complement sentiment in terms of ranking as a factor.
- User Interaction Score
- Consensus Sentiment Score
To determine a user interaction score, user signals such as SERP CTR and duration of stay are addressed.
Google’s patent Systems and Methods for Re-Ranking ranked Search Results describes how search engines can take into account not only the references to the author’s content, but also the portion he or she has contributed to a thematic document corpus in an author score.
“In some embodiments, determining the original author score for the respective entity includes: identifying a plurality of portions of content in the index of known content identified as being associated with the respective entity, each portion in the plurality of portions representing a predetermined amount of data in the index of known content; and calculating a percentage of the plurality of the portions that are first instances of the portions of content in the index of known content.”
This Google patent was drawn in August 2018. It describes the refinement of search results according to an author scoring including a citation scoring. Citation scoring is based on the number of references to an author’s documents. Another criterion for author scoring is the proportion of content that an author has contributed to a corpus of documents.
wherein determining the author score for a respective entity includes: determining a citation score for the respective entity, wherein the citation score corresponds to a frequency at which content associated with the respective entity is cited; determining an original author score for the respective entity, wherein the original author score corresponds to a percentage of content associated with the respective entity that is a first instance of the content in an index of known content; and combining the citation score and the original author score using a predetermined function to produce the author score;
The scientific paper Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources from Google deals with the algorithmic determination of the credibility of websites.
This scientific paper deals with how to determine the trustworthiness of online sources. In addition to the analysis of links, a new method is presented that is based on the verification of published information for accuracy.
We propose a new approach that relies on endogenous signals, namely, the correctness of factual information provided by the source. A source that has few false facts is considered to be trustworthy.
For this, methods of data mining are used, which I have already discussed in detail in the articles How can Google identify and interpret entities from unstructured content? and Natural language processing to build a semantic database.
We call the trustworthiness score we computed Knowledge-Based Trust (KBT). On synthetic data, we show that our method can reliably compute the true trustworthiness levels of the sources.
The previous method of assessing the trustworthiness of sources based on links and browser data on website usage behavior has weaknesses, as less popular sources are given a lower score and are unfairly shortchanged, even though they provide very good information.
Using this approach, sources can be rated with a “trustworthiness score” without including the popularity factor. Websites that frequently provide incorrect information are devalued. Websites that publish information in line with the general consensus are rewarded. This also reduces the likelihood that websites that attract attention through fake news will gain visibility on Google.
The Google patent “Website Representation Vectors” used to classify websites based on expertise and authority.
Here’s a summary of the key points from the patent:
- The patent application was filed in August 2018, and it covers a range of industries, including health and artificial intelligence sites as examples.
- The patent application uses Neural Networks to understand patterns and features behind websites to classify those sites. This classification is based on different levels of expertise, with examples given such as doctors (experts), medical students (apprentices), and laypeople (nonexperts) in the health domain.
- The patent application does not specifically define a “quality score”, but it mentions classifying websites based on whether they meet thresholds related to these scores. These scores could potentially relate to a range of quality measures of sites relative to other sites.
- The patent also discusses how search queries from specific knowledge domains (covering specific topics) might return results using classified sites from the same knowledge domain. It aims to limit possible results pages based on classifications involving industry and expertise that meet sufficient quality thresholds.
Google could use these Website Representation Vectors to classify sites based on features found on those sites. The classifications can be more diverse than representing categories of websites within knowledge domains, breaking the categories down much further.
This patent deals with the generation of author vectors using neural networks. Specifically, the patent describes obtaining a set of sequences of words, where these sequences are classified as being authored by a specific author. These sequences include a plurality of first sequences of words, and for each first sequence, a respective second sequence of words that follows the first sequence. The neural network system is trained on these first and second sequences of words to determine an author vector that characterizes the author.
Once an author vector has been computed, it can be used to determine the cluster to which a user’s author vector belongs. The response generated by an automatic response system can then be conditioned on the representative author vector for that cluster.
The last two mentioned patents could be used by Google classifying and evaluate source entities like authors or companies and domains in topical areas. This could be used by vector space analyses and embeddings. More on this topic in my post How can Google identify and rank relevant documents via entities, NLP & vector space analysis?
How Google fights Disinformation
In this whitepaper Google introduces 2019 on the security conference in munich there are many interesting references to the E-E-A-T concept. I summarized the whitepaper in my post “Insights from the Whitepaper “How Google fights misinformation” on E-A-T and Ranking”
The Search quality evaluatior guidelines are published for the thousands of quality rater worldwide, that are rating the quality of search results. The feedback has an impact on the development of the ranking algorithms. In the guidelines Google introduced E-A-T for the very first time and you can find detailed informations how quality rater should rate in terms of E-E-A-T. So it is the most important ressource to gain knowledge around E-E-A-T.
There is no officially communicated relation between Panda now Coati and E-E-A-T, but the relationship is obvious. You can find several hints in Google statements, that Coati (ex Panda) is also part of E-E-A-T. So you have to check Googles information around quality of content and Coati.
Overview: Possible factors influencing E-E-A-T
Based on the aforementioned sources, our own experience and further statements from Google, we have created the following overview of possible measurable factors for an E-E-A-T rating. Please share the knowledge!
- Most interesting Google Patents for SEO in 2023 - 22. September 2023
- E-E-A-T: More than an introduction to Experience ,Expertise, Authority, Trust - 3. July 2023
- Google’s journey to a semantic search engine - 29. June 2023
- What is Digital Authority Management? Role and tasks of a Digital Authority Manager - 27. June 2023
- The most interesting Google patents and scientific papers on E-E-A-T - 8. June 2023
- The role of content types and formats in the customer journey - 1. June 2023
- Stay in touch with me - 31. May 2023
- Marketing we have a problem! Silos and missing interfaces - 30. May 2023
- The dimensions of the Google ranking - 15. April 2023
- Relevance, pertinence and quality in search engines - 9. March 2023