How Does Generative Retrieval Scale to Millions of Passages?
Topics: AI (Deep Learning), Indexing, LLMO / GEO, Passage based retrieval, Query Fan Out, Ranking, Retrieval Augmented Generation (RAG)
This Google paper investigates how generative retrieval techniques perform when scaled to millions of passages. Unlike traditional retrieval systems that rely on external indices, generative retrieval reframes retrieval as a sequence-to-sequence problem, mapping queries to document identifiers. The study empirically evaluates generative retrieval methods on various corpus sizes, including the full MS MARCO dataset with 8.8M passages. Key findings highlight the importance of synthetic query generation (query fan out) , the limitations of naive model scaling, and the ineffectiveness of existing architecture modifications when considering compute cost. The results suggest that while generative retrieval is competitive with dual encoders on small datasets, scaling to millions of passages remains an open challenge.