The standard retrieval approach follows the following processes:

Document Chunking
Chunks to Vector Embeddings
Vector Embeddings to Vector Stores
User query into vector embeddings
Retriever retrieves the embeddings from the vector store with the highest semantic similarity.

While embedding models excel at capturing semantic relationships, they can miss exact keyword matches. BM25 (Best Matching 25) is a ranking method to find precise word or phrase matches from the documents. BM25 works on TF-IDF(Term Frequency - Inverse Document Frequency) encoding that measures how important a word is to a document and eliminates common words.

Retrieval with BM25:

Document Chunking
Chunks to TF-IDF encodings
BM25 for keyword matching
Adding top-k matches to the retriever for generating a response

The Problem

Due to limitations caused by chunk overlapping size, traditional RAG solutions remove context between individual chunks when encoding information, which often results in the system failing to retrieve the relevant information from the knowledge base.

For example, imagine you had a collection of financial information (say, U.S. SEC filings) embedded in your knowledge base, and you received the following question: "What was the revenue growth for ACME Corp in Q2 2023?"

A relevant chunk might contain the text: "The company's revenue grew by 3% over the previous quarter." However, this chunk on its own doesn't specify which company it's referring to or the relevant time period, making it difficult to retrieve the right information or use the information effectively.

Solution: Contextual Retrieval

Contextual Retrieval solves this problem by prepending chunk-specific context to each chunk before encoding (50-100 tokens of additional context).