Predicting site quality
Topics: Document Classification, E-E-A-T, Navneet Panda, Phrase based Indexing, Probably in use, Ranking
The patent describes methods for predicting site quality scores for new websites using phrase models. It involves generating a phrase model from previously scored sites, where the model maps phrase frequency measures to average site quality scores.
For new sites, relative frequencies of phrases are determined and used with the model to calculate an aggregate site quality score. This predicted score can then be used by search engines to help rank resources from that site in search results.
Behzad Hussain
10.12.2024, 18:06 Uhr
Thanks for this nice explanation.
The Phrase Model generation and weighting is very complex in the Patent but you have explained these factors very nicely and comprehensively.
Olaf Kopp
10.12.2024, 18:37 Uhr
THX Behzad!
Behzad Hussain
19.01.2025, 12:58 Uhr
You mentioned this: “The process starts by identifying n-grams (phrases consisting of 2 to 5 words) within the text of the sites being analyzed. ”
In the Patent it’s also mentioned that the one gram is also considered. “In other implementations, n-grams of only one length are used.” (It’s mentioned under the exaplanation of Fig. 2, explanation of first step 200.
I think “only one length” refers here as unigram or 1-gram.