s3: You Don’t Need That Much Data to Train a Search Agent via RL
Topics: Data Mining, Deepseek, LLMO / GEO, Retrieval Augmented Generation (RAG)
This paper by Deepseek introduces s3, a lightweight, model-agnostic framework that trains a dedicated search agent via reinforcement learning to improve retrieval for downstream question answering. By decoupling the search component from the language model generator and using a novel Gain Beyond RAG reward—measuring the boost in answer accuracy over naïve retrieval—s3 achieves state-of-the-art performance on multiple general and medical QA benchmarks using just 2.4 k training examples.
- Paper ID: arXiv:2505.14146v1
- Inventors / Authors: Pengcheng Jiang; Xueqiang Xu; Jiacheng Lin; Jinfeng Xiao; Zifeng Wang; Jimeng Sun; Jiawei Han
- Last publishing date: 20 May 2025