10

stage · curriculum

Fine-Tuning

Change the model itself when prompting and RAG aren't enough. SFT, LoRA, DPO, GRPO, distillation — pick the one that matches your data, budget, and goal. Most teams reach for fine-tuning too early.

9 articles
51 min to read
4 demos
5 books
if you only do one thing

Don't fine-tune until you've ruled out prompting and RAG. When you do, LoRA is the cheapest production-quality lever — slide the rank and watch why low-rank actually works.

Articles in this stage

  1. 01 Data & Tooling
  2. 02 Distillation
  3. 03 Embedding Fine-Tuning
  4. 04 Field report: Llama 3 — frontier post-training, in 92 pages field report
  5. 05 LoRA & QLoRA
  6. 06 Field report: Phi-3 — synthetic data and distillation, in the open field report
  7. 07 RLHF, DPO, GRPO — Preference and Reward Training
  8. 08 Supervised Fine-Tuning (SFT)
  9. 09 When to Fine-Tune

Stage 10 — Fine-Tuning

When prompting and RAG aren’t enough, you change the model itself. Fine-tuning takes a pretrained model and adapts it to your task, your domain, your style — by training on more data.

In 2026, fine-tuning is less universal than it was in 2022. Frontier prompting + RAG covers many cases that used to require fine-tuning. But for the cases that remain, fine-tuning is irreplaceable.

Prerequisites

  • Stage 03 (NN training, optimizers)
  • Stage 06 (transformer architecture)
  • Stage 08 (prompting)

Learning ladder

  1. When to fine-tune — decision flow vs prompting and RAG
  2. Supervised fine-tuning (SFT) — the foundation
  3. LoRA & QLoRA — parameter-efficient training
  4. RLHF, DPO, GRPO — preference and reward-based training
  5. Distillation — copy a teacher’s behavior into a small student
  6. Embedding fine-tuning
  7. Data & tooling — TRL, Axolotl, Unsloth, dataset design

MVU

You can:

  • Decide when fine-tuning is the right tool (and when it isn’t)
  • Pick between SFT, LoRA, DPO, GRPO based on use case and resources
  • Estimate the data and compute needed for a given fine-tune
  • Avoid the most common pitfall: fine-tuning on data your model already does fine on

Exercise

LoRA-fine-tune a 7B model on a 1k-example instruction dataset of your design. Evaluate against the base model on a held-out test set. Aim for measurable improvement on your target task without regression on general capabilities.

Field reports — real-world case studies

Observational write-ups based on published papers, with strict citation and explicit “what’s still confidential” sections. Each maps a curriculum article to a real frontier-lab artifact.

Hands-on companions

Watch it interactively:

  • LoRA Lab — drag the rank slider; watch the singular-value spectrum reveal why low-rank matters. Real Jacobi SVD on a synthetic ΔW.
  • RLHF Lab — your A/B picks fit a Bradley-Terry reward model in real time. Real gradient descent runs on the labels you click; weights update; loss curve plotted.
  • Distillation Lab — real GPT-2 teacher logits, learnable student, KL gradient running live. Slide T and α; watch the student’s distribution converge.
  • Quantization Lab — slide bits from 16 → 2; watch RMSE rise and memory drop. Foundational for QLoRA’s 4-bit base.

Build it in code:

See also

Further reading

Books move slower than papers in this field — treat these as foundations, not replacements for the latest research. Real authors, real publishers, real editions. Free badges mark books with author-authorized full text online.

  1. AI Engineering: Building Applications with Foundation Models cover

    AI Engineering: Building Applications with Foundation Models

    Chip Huyen

    O'Reilly, 2024

    The most current production-AI book. The LLM-era successor to Designing ML Systems.

  2. Build a Large Language Model From Scratch cover

    Build a Large Language Model From Scratch

    Sebastian Raschka

    Manning, 2024

    Implements GPT-2 end-to-end in PyTorch, layer by layer.

  3. Reinforcement Learning: An Introduction coverfree

    Reinforcement Learning: An Introduction

    Richard S. Sutton, Andrew G. Barto

    MIT Press, 2nd ed., 2018

    The canonical RL textbook. Free PDF authorized by the authors.

  4. Deep Reinforcement Learning Hands-On cover

    Deep Reinforcement Learning Hands-On

    Maxim Lapan

    Packt, 3rd ed., 2024

    PyTorch implementations of policy-gradient methods, including PPO.