Stage 10 — Fine-Tuning

When prompting and RAG aren’t enough, you change the model itself. Fine-tuning takes a pretrained model and adapts it to your task, your domain, your style — by training on more data.

In 2026, fine-tuning is less universal than it was in 2022. Frontier prompting + RAG covers many cases that used to require fine-tuning. But for the cases that remain, fine-tuning is irreplaceable.

Prerequisites

  • Stage 03 (NN training, optimizers)
  • Stage 06 (transformer architecture)
  • Stage 08 (prompting)

Learning ladder

  1. When to fine-tune — decision flow vs prompting and RAG
  2. Supervised fine-tuning (SFT) — the foundation
  3. LoRA & QLoRA — parameter-efficient training
  4. RLHF, DPO, GRPO — preference and reward-based training
  5. Distillation — copy a teacher’s behavior into a small student
  6. Embedding fine-tuning
  7. Data & tooling — TRL, Axolotl, Unsloth, dataset design

MVU

You can:

  • Decide when fine-tuning is the right tool (and when it isn’t)
  • Pick between SFT, LoRA, DPO, GRPO based on use case and resources
  • Estimate the data and compute needed for a given fine-tune
  • Avoid the most common pitfall: fine-tuning on data your model already does fine on

Exercise

LoRA-fine-tune a 7B model on a 1k-example instruction dataset of your design. Evaluate against the base model on a held-out test set. Aim for measurable improvement on your target task without regression on general capabilities.

Field reports — real-world case studies

Observational write-ups based on published papers, with strict citation and explicit “what’s still confidential” sections. Each maps a curriculum article to a real frontier-lab artifact.

Hands-on companions

Watch it interactively:

  • LoRA Lab — drag the rank slider; watch the singular-value spectrum reveal why low-rank matters. Real Jacobi SVD on a synthetic ΔW.
  • RLHF Lab — your A/B picks fit a Bradley-Terry reward model in real time. Real gradient descent runs on the labels you click; weights update; loss curve plotted.
  • Distillation Lab — real GPT-2 teacher logits, learnable student, KL gradient running live. Slide T and α; watch the student’s distribution converge.
  • Quantization Lab — slide bits from 16 → 2; watch RMSE rise and memory drop. Foundational for QLoRA’s 4-bit base.

Build it in code:

See also