Agentic RAG Pipeline — Siddhesh More

Most RAG tutorials wire LangChain as a linear chain: retrieve, then generate. The moment you need a step to loop back (rerank, critique, retry), a chain breaks. LangGraph models the pipeline as an explicit state machine where conditional edges control flow. That is the core thing this project is built to understand.

The architecture is a 3-node StateGraph: Retriever retrieves from a FAISS IVF-PQ index, Reranker scores and filters the candidates with a cross-encoder, Explainer generates the final answer. The Critic node can conditionally route back to the Planner when confidence is low, forming a genuine ReAct loop instead of a one-shot chain.

The retrieval layer uses FAISS IVF-PQ: Inverted File Index partitions the vector space into Voronoi cells so search only scans a fraction of the corpus, Product Quantization compresses each vector from 768 floats to a handful of codebook indices. Both together bring billion-scale approximate nearest-neighbor search into memory.

The index is exposed as an MCP tool server. @mcp.tool() def retrieve(query, k) wraps the FAISS index as a callable tool any LangGraph node or external agent can invoke. Each node in the graph is wrapped with Langfuse @observe, so every retrieval call, rerank step, and generation gets a trace in the dashboard with latency, token count, and inputs/outputs.

Evaluation uses RAGAS: context recall, faithfulness, answer relevancy, and context precision, measured automatically against a ground-truth QA set.