Modern RAG: From Simple Pipelines to Agentic Knowledge Systems

Pedro Laboy

May 11, 2025 4 mins to read

𝗧𝗟;𝗗𝗥: In 2025, Retrieval-Augmented Generation (RAG) has evolved beyond “retrieve a few chunks, stuff the context.” We now build agentic, hybrid, and multimodal systems with temporal awareness, graph reasoning, and instruction-following rerankers. With production-grade evaluation loops and standards like the Model Context Protocol (MCP), plugging in enterprise data and tools is cleaner than ever.

𝗪𝗵𝗮𝘁 “𝗠𝗼𝗱𝗲𝗿𝗻 𝗥𝗔𝗚” 𝗠𝗲𝗮𝗻𝘀 𝗶𝗻 𝟮𝟬𝟮𝟱

🔹𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗥𝗔𝗚, not static chains. Instead of fixed, single-pass pipelines, orchestrators now plan, retrieve, reflect, and refine. They dynamically switch strategies—expanding queries, routing by intent, performing multi-hop retrieval, or calling tools. Think of it as a cooperative team of a “planner,” “retriever(s),” “critic,” and “generator.”

🔹𝗛𝘆𝗯𝗿𝗶𝗱 𝗿𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹 𝗶𝘀 𝘁𝗵𝗲 𝗱𝗲𝗳𝗮𝘂𝗹𝘁. Production systems now combine lexical (BM25/sparse) and dense vector search for superior recall and precision, often using a two-stage process: candidate recall followed by an LLM-based rerank. Both open-source and cloud engines document this as the baseline pattern.

🔹𝗦𝗺𝗮𝗿𝘁𝗲𝗿 𝗿𝗲𝗿𝗮𝗻𝗸𝗲𝗿𝘀. Teams now routinely use cross-encoder and instruction-following rerankers to align results with task-specific criteria like compliance, freshness, or strict definitions. Major platforms, such as Vertex AI RAG Engine, include reranking as a first-class feature.

🔹𝗚𝗿𝗮𝗽𝗵 𝗮𝗻𝗱 𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲-𝗮𝘄𝗮𝗿𝗲 𝗿𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹. GraphRAG builds knowledge graphs and community summaries during ingestion. At query time, it traverses relationships to supply higher-order context like entities, events, and communities. 2025 guides clarify that this is best for multi-hop, narrative, or relationship-heavy questions.

🔹 𝗠𝘂𝗹𝘁𝗶𝗺𝗼𝗱𝗮𝗹 𝗥𝗔𝗚 𝗶𝘀 𝗺𝗮𝗶𝗻𝘀𝘁𝗿𝗲𝗮𝗺. Retrieval across text, tables, images, and video is now common. Use cases include support, field operations, and financial analysis that rely on figures, diagrams, and screenshots.

🔹𝗧𝗲𝗺𝗽𝗼𝗿𝗮𝗹 𝗮𝗻𝗱 “𝗳𝗿𝗲𝘀𝗵𝗻𝗲𝘀𝘀-𝗮𝘄𝗮𝗿𝗲” 𝗥𝗔𝗚. Modern frameworks can parse temporal expressions, assemble time-consistent evidence, and query time-stamped data stores. This is crucial for finance, policy changes, and news-related tasks.

🔹 𝗔𝗴𝗲𝗻𝘁 𝘀𝘁𝗮𝗻𝗱𝗮𝗿𝗱𝘀 𝗳𝗼𝗿 𝗱𝗮𝘁𝗮 𝗮𝗻𝗱 𝘁𝗼𝗼𝗹 𝗮𝗰𝗰𝗲𝘀𝘀. The Model Context Protocol (MCP) has emerged as the “USB-C for AI tools,” simplifying how agents access CRMs, data warehouses, and search, which in turn drives multi-vendor agent collaboration.

𝗔 𝟮𝟬𝟮𝟱 𝗣𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻-𝗥𝗲𝗮𝗱𝘆 𝗥𝗲𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲

A) 𝗦𝗼𝘂𝗿𝗰𝗲𝘀 & 𝗔𝗰𝗰𝗲𝘀𝘀

▫️𝘊𝘰𝘯𝘯𝘦𝘤𝘵𝘰𝘳𝘴/𝘗𝘳𝘰𝘵𝘰𝘤𝘰𝘭𝘴: MCP servers (databases, CRM, code); identity and permissions are propagated to the agent.

▫️𝘌𝘹𝘵𝘦𝘳𝘯𝘢𝘭 𝘋𝘢𝘵𝘢: Public web and vertical APIs are used when internal knowledge gaps are detected.

𝗕) 𝗜𝗻𝗴𝗲𝘀𝘁𝗶𝗼𝗻 & 𝗚𝗼𝘃𝗲𝗿𝗻𝗮𝗻𝗰𝗲

▫️𝘗𝘳𝘰𝘤𝘦𝘴𝘴𝘪𝘯𝘨: Parsing and normalization for documents, PDFs, HTML, tables, images, and video.

▫️𝘚𝘦𝘤𝘶𝘳𝘪𝘵𝘺: PII/secret scrubbing and policy tagging (e.g., for retention and visibility).

▫️𝘊𝘩𝘶𝘯𝘬𝘪𝘯𝘨: The 2025 trend is hierarchical, structure-aware chunking (section → paragraph → sentence). Studies caution that expensive semantic chunking is not always a win—measure its impact before adopting.

▫️𝘌𝘮𝘣𝘦𝘥𝘥𝘪𝘯𝘨𝘴 & 𝘐𝘯𝘥𝘦𝘹𝘦𝘴: Dense vectors (per modality) plus lexical/sparse indexes, with versions and timestamps stored for temporal routing.

𝗖) 𝗦𝘁𝗼𝗿𝗮𝗴𝗲

▫️𝘝𝘦𝘤𝘵𝘰𝘳 𝘋𝘉: For dense vector search.

▫️𝘓𝘦𝘹𝘪𝘤𝘢𝘭 𝘐𝘯𝘥𝘦𝘹: For identifiers and keywords.

▫️𝘖𝘣𝘫𝘦𝘤𝘵 𝘚𝘵𝘰𝘳𝘦: For raw artifacts like PDFs and images.

▫️𝘒𝘯𝘰𝘸𝘭𝘦𝘥𝘨𝘦 𝘎𝘳𝘢𝘱𝘩: For entities, events, and relationships if your domain requires multi-hop reasoning.

𝗗) 𝗥𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹 𝗢𝗿𝗰𝗵𝗲𝘀𝘁𝗿𝗮𝘁𝗼𝗿

▫️𝘘𝘶𝘦𝘳𝘺 𝘈𝘯𝘢𝘭𝘺𝘴𝘪𝘴: Intent classification, decomposition into sub-questions, and temporal parsing.

▫️𝘏𝘺𝘣𝘳𝘪𝘥 𝘙𝘦𝘵𝘳𝘪𝘦𝘷𝘢𝘭: Candidate recall using lexical and dense search, with optional graph traversal.

▫️𝘙𝘦𝘳𝘢𝘯𝘬𝘪𝘯𝘨: Cross-encoder or instruction-following rerankers using domain-specific criteria (e.g., “prefer 2024–2025 docs”).

▫️𝘍𝘶𝘴𝘪𝘰𝘯: Deduplication of sources and diversification of evidence.

▫️𝘉𝘶𝘥𝘨𝘦𝘵 𝘊𝘰𝘯𝘵𝘳𝘰𝘭: Adaptive context packing based on the query’s uncertainty or answer type.

𝗘) 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗥𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 & 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻

▫️𝘓𝘰𝘰𝘱𝘴: Planner/reflector loops identify gaps and trigger follow-up retrieval or switch strategies.

▫️𝘈𝘵𝘵𝘳𝘪𝘣𝘶𝘵𝘪𝘰𝘯: Citations include document IDs, page anchors, and timestamps.

▫️𝘎𝘶𝘢𝘳𝘥𝘳𝘢𝘪𝘭𝘴: Safeguards for confidentiality and safety are built-in.

𝗙) 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 & 𝗢𝗯𝘀𝗲𝗿𝘃𝗮𝗯𝗶𝗹𝗶𝘁𝘆

▫️𝘊𝘰𝘯𝘵𝘪𝘯𝘶𝘰𝘶𝘴 𝘌𝘷𝘢𝘭𝘶𝘢𝘵𝘪𝘰𝘯: Go beyond one-off benchmarks. Combine offline sets with runtime evaluations (faithfulness, hallucination detectors) using modern RAG benchmarks like GaRAGe.

▫️𝘐𝘯𝘴𝘵𝘳𝘶𝘮𝘦𝘯𝘵𝘢𝘵𝘪𝘰𝘯: Trace which chunks fueled each answer and track hit rates, reranker uplift, latency, and “no-evidence” fallbacks.

𝗚) 𝗗𝗲𝗹𝗶𝘃𝗲𝗿𝘆

▫️𝘍𝘳𝘢𝘮𝘦𝘸𝘰𝘳𝘬𝘴: Use modern SDKs with agent loops and streaming UIs (e.g., AI SDK 5).