Dense retrieval employing progressive distillation training
Topics: AI (Deep Learning), Microsoft, Ranking, Reranking, Search Query Processing, User Signals
The Microsoft patent describes a dense retrieval system for efficiently retrieving relevant search results from a large pool of potential results. It uses machine learning models trained on an order metric that defines a ranked ordering of search results. The system has two main components:
- A dense retriever that computes embeddings of the query and potential results to select candidate results.
- A ranker that uses a cross-encoder to score and rank the candidate results.
The key innovation is progressively distilling knowledge from the order metric to the ranker, and then from the ranker to the retriever. This allows the retriever to quickly identify relevant results while incorporating relevance information from the order metric.
The system aims to improve search relevance over conventional methods while maintaining efficiency for large-scale retrieval tasks