Stage 09 — Retrieval-Augmented Generation (RAG)
LLMs have a finite context window and can’t be retrained for every new piece of information. RAG is the engineering pattern that lets them work with arbitrary up-to-date knowledge: retrieve relevant passages, stuff them in the prompt, generate.
It sounds simple. Production RAG is one of the most subtle systems in modern AI.
Prerequisites
- Stage 05 (embeddings)
- Stage 08 (prompting, structured outputs)
Learning ladder
- RAG fundamentals — the loop, when to use it, when not to
- Chunking strategies — fixed, semantic, structural, late chunking
- Embedding models for retrieval — choosing, evaluating, tuning
- Vector databases — pgvector, Qdrant, LanceDB, Pinecone — pick one
- Hybrid search & reranking — BM25 + dense, cross-encoders
- Advanced retrieval patterns — HyDE, FLARE, query decomposition, GraphRAG
- Evaluating RAG — retrieval@k, faithfulness, golden sets, RAGAs
MVU
You can:
- Build a RAG end-to-end (chunk → embed → store → retrieve → generate)
- List 5 ways your RAG can silently fail in production
- Pick chunking and retrieval parameters with a defensible reason
- Evaluate RAG quality with retrieval and end-to-end metrics
Exercise
Build a RAG over 1000 of your own notes/documents. Then break it:
- Ambiguous queries
- Multi-hop questions
- Questions whose answer spans multiple documents
- Questions where the right answer is “I don’t know”
For each failure mode, fix it and document what you changed.
Why this stage matters
RAG is the most common LLM application pattern in industry. Most AI startups have RAG inside them somewhere. Mediocre RAG is everywhere; great RAG is rare and worth a lot.
Hands-on companions
After the theory here, three concrete next stops:
Ship a production RAG stack:
- /ship/06 — chunking the production way — three chunkers, the trade-offs, when to pick which
- /ship/07 — embeddings + sqlite-vec — MiniLM-L6 + sqlite-vec, the cheapest production-tier vector store
- /ship/08 — BM25 + dense + rerank — the 3-stage hybrid that beats any single strategy
- /ship/13 — evaluating RAG in production — golden sets, drift detection, the feedback pipeline
See it as a real product:
- /case-studies/01 — docs assistant with citations — RAG as a shipped product. Citation-first prompting, three-bucket refusal eval, real numbers (96% cite-coverage, 91% refusal precision).
See also
- Stage 05 — Embeddings
- Stage 07 — Long context
- Stage 13 — Production
- Stage 14 — Text-to-SQL — RAG over schemas


