Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study
Topics: AI (Deep Learning), Document Classification, E-E-A-T
The paper “Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study” by Google Research explores how classifiers trained to distinguish between human-written and machine-generated text can serve as unsupervised predictors of webpage quality. By applying such models to 500 million webpages, the study demonstrates that these detectors can effectively identify low-quality pages without explicit labeling. It further reveals that many low-quality pages stem from machine-translated content, essay farms, SEO manipulation, and NSFW content.