Anchor tag indexing in a web crawler system
Topics: Backlinks, Freshness, Indexing
The Google patent describes a method and system for indexing documents in a collection of linked documents, such as web pages on the internet. It focuses on creating and using anchor maps to index information from other documents that link to a target document, not just the content of the target document itself. Key aspects include:
- Creating a link log of source documents and their outbound links
- Generating a sorted anchor map with target documents and lists of source documents linking to them
- Including annotations like anchor text from the source documents in the anchor map
- Using the anchor map information when indexing documents to improve search relevance
- Merging and updating anchor maps over time as new information is crawled
This approach allows indexing useful information about a document from other linking documents, even if the target document itself is unavailable or has little text content. It can improve search results by incorporating more contextual information about each document.