Scalable In-context Ranking with Generative Models
Topics: Chunk Relevance, LLM Readability, LLMO / GEO, Passage based retrieval, Ranking, Scoring
This Google paper introduces BlockRank, a new method to make Large Language Models (LLMs) more efficient at ranking documents. The standard approach, known as In-context Ranking (ICR), is powerful but becomes very slow and expensive as the number of documents increases. BlockRank solves this by modifying the LLM’s attention mechanism to process documents in parallel blocks and adding a special training objective that teaches the model to focus its attention on the most relevant document, resulting in a system that is both faster and highly accurate.
