On the Theoretical Limitations of Embedding-Based Retrieval
Topics: Ranking, Scoring, Search Intent
The Google paper explores the theoretical limitations of embedding-based retrieval systems in information retrieval (IR). It highlights that while vector embeddings are widely used for various complex retrieval tasks, they face critical constraints in their representational capacity that can limit their effectiveness, even in seemingly simple queries. The authors present a new dataset, LIMIT, designed to empirically validate these theoretical limitations, demonstrating that state-of-the-art models often fail to meet performance benchmarks on this dataset.
