RAG-Reranking

author:BZdate:2025-04-22

Reranking in Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) is a powerful approach that combines retrieval and generation to produce high-quality responses. However, the quality of the final response can be significantly influenced by the effectiveness of the retrieval process.

Reranking can improve the quality of the final response by reordering the retrieved documents based on their relevance to the query. In this blog, we will discuss how reranking can be integrated into RAG and its benefits.

This Blog is good for you to choose correct reranker model.

`langchain` implementation

1
from langchain_community.document_loaders import TextLoader
2
from langchain_postgres import PGVector
3
from langchain_huggingface import HuggingFaceEmbeddings
4
from langchain_text_splitters import RecursiveCharacterTextSplitter
5
from langchain.retrievers import ContextualCompressionRetriever
6
from langchain.retrievers.document_compressors import CrossEncoderReranker
7
from langchain_community.cross_encoders import HuggingFaceCrossEncoder
8

9

10
def pretty_print_docs(docs):
11
    print(
12
        f"\n{'-' * 100}\n".join(
13
            [f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]
14
        )
15
    )
16

17
# Text split
18
# chunking
19
documents = TextLoader("state_of_the_union.txt").load()
20
text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=0)
21
texts = text_splitter.split_documents(documents)
22

23
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2/")
24
connection = "postgresql+psycopg://"  # Uses psycopg3!
25
collection_name = "reranking_test"
26

27

28
# indexing
29
vector_store = PGVector(
30
    embeddings=embeddings,
31
    collection_name=collection_name,
32
    connection=connection,
33
    use_jsonb=True,
34
)
35

36
vector_store.add_documents(texts, ids=[i for i, _ in enumerate(texts, start=1)])
37

38
# regular retrieval
39
retriever = vector_store.as_retriever(search_kwargs={"k": 20})
40

41
query = "What is the plan for the economy?"
42
docs = retriever.invoke(query)
43
print("\nRetrieved Documents:\n")
44
pretty_print_docs(docs)
45

46
# reranking with CrossEncoder
47
model = HuggingFaceCrossEncoder(model_name="/Users/binzhang/models/BAAI/bge-reranker-v2-m3")
48
compressor = CrossEncoderReranker(model=model, top_n=3)
49
compression_retriever = ContextualCompressionRetriever(
50
    base_compressor=compressor, base_retriever=retriever #reordering after retrieval
51
)
52
compressed_docs = compression_retriever.invoke(query)
53
print("\nReranked Documents:\n")
54
pretty_print_docs(compressed_docs)

Calculate Score of reranking pairs

1
from langchain_community.cross_encoders import HuggingFaceCrossEncoder
2

3
# Initialize the cross encoder
4
cross_encoder = HuggingFaceCrossEncoder(
5
    model_name="BAAI/bge-reranker-v2-m3",
6
    model_kwargs={'device': 'cpu'}
7
)
8

9
# Create text pairs to score
10
text_pairs = [
11
    ("How do I bake bread?", "This is a recipe for sourdough bread"),
12
    ("How do I bake bread?", "The weather is nice today")
13
]
14

15
# Get similarity scores
16
scores = cross_encoder.score(text_pairs)
17

18
print(scores)

RAG-Reranking

Reranking in Retrieval-Augmented Generation (RAG)

langchain implementation

Calculate Score of reranking pairs

`langchain` implementation