Head-to-Tail: How Knowledgeable are Large Language Models (LLMs)? A.K.A. Will LLMs Replace Knowledge Graphs
Topics: Xin Luna Dong
This paper introduces “Head-to-Tail,” a comprehensive benchmark designed to evaluate how well Large Language Models (LLMs) retain and accurately recall factual knowledge. The researchers created 18,000 question-answer pairs covering various domains and entity types, categorized into “head” (most popular), “torso” (moderately popular), and “tail” (least popular) knowledge. Their findings show that even advanced LLMs like GPT-4 achieve only about 31% accuracy overall in answering factual questions, with performance declining significantly for less popular topics.