Rankers, Judges, and Assistants: Towards Understanding the Interplay of LLMs in Information Retrieval Evaluation
Topics: Krisztian Balog, LLMO / GEO, Prompt Engineering, Retrieval Augmented Generation (RAG)
This Google Deepmind research paper examines the complex interplay between different uses of Large Language Models in information retrieval systems, particularly focusing on their roles as rankers, judges, and content creation assistants. The authors present experimental evidence showing that LLM judges exhibit bias towards LLM-based rankers and have limitations in discerning subtle performance differences between systems. The study provides important insights into the challenges and potential risks of using LLMs for automated evaluation in information retrieval systems.
