09

stage · curriculum

RAG

Retrieve relevant passages, stuff them in the prompt, generate. Sounds simple. Mediocre RAG is everywhere; great RAG is rare and worth a lot — chunking, hybrid search, reranking, evaluation are the levers.

7 articles
34 min to read
3 demos
3 books
if you only do one thing

Most AI startups have RAG inside them somewhere. The reference implementation isn't subtle; the failure modes are. Build, then break, then iterate.

Articles in this stage

  1. 01 Advanced Retrieval Patterns
  2. 02 Chunking Strategies
  3. 03 Embedding Models for Retrieval
  4. 04 Evaluating RAG
  5. 05 Hybrid Search & Reranking
  6. 06 RAG Fundamentals
  7. 07 Vector Databases

Stage 09 — Retrieval-Augmented Generation (RAG)

LLMs have a finite context window and can’t be retrained for every new piece of information. RAG is the engineering pattern that lets them work with arbitrary up-to-date knowledge: retrieve relevant passages, stuff them in the prompt, generate.

It sounds simple. Production RAG is one of the most subtle systems in modern AI.

Prerequisites

  • Stage 05 (embeddings)
  • Stage 08 (prompting, structured outputs)

Learning ladder

  1. RAG fundamentals — the loop, when to use it, when not to
  2. Chunking strategies — fixed, semantic, structural, late chunking
  3. Embedding models for retrieval — choosing, evaluating, tuning
  4. Vector databases — pgvector, Qdrant, LanceDB, Pinecone — pick one
  5. Hybrid search & reranking — BM25 + dense, cross-encoders
  6. Advanced retrieval patterns — HyDE, FLARE, query decomposition, GraphRAG
  7. Evaluating RAG — retrieval@k, faithfulness, golden sets, RAGAs

MVU

You can:

  • Build a RAG end-to-end (chunk → embed → store → retrieve → generate)
  • List 5 ways your RAG can silently fail in production
  • Pick chunking and retrieval parameters with a defensible reason
  • Evaluate RAG quality with retrieval and end-to-end metrics

Exercise

Build a RAG over 1000 of your own notes/documents. Then break it:

  • Ambiguous queries
  • Multi-hop questions
  • Questions whose answer spans multiple documents
  • Questions where the right answer is “I don’t know”

For each failure mode, fix it and document what you changed.

Why this stage matters

RAG is the most common LLM application pattern in industry. Most AI startups have RAG inside them somewhere. Mediocre RAG is everywhere; great RAG is rare and worth a lot.

Hands-on companions

After the theory here, three concrete next stops:

Ship a production RAG stack:

See it as a real product:

See also

Further reading

Books move slower than papers in this field — treat these as foundations, not replacements for the latest research. Real authors, real publishers, real editions. Free badges mark books with author-authorized full text online.

  1. Hands-On Large Language Models cover

    Hands-On Large Language Models

    Jay Alammar, Maarten Grootendorst

    O'Reilly, 2024

    Visual, practical, including Alammar's classic Illustrated Transformer diagrams in book form.

  2. Building LLM-Powered Applications cover

    Building LLM-Powered Applications

    Valentina Alto

    Packt, 2024

    Application-level patterns with framework comparisons (LangChain, LlamaIndex, Haystack).