demo

A small student learning a frontier teacher

Real GPT-2 logits as the teacher. A randomly-initialized student. Slide temperature, mix the hard and soft losses, hit play, and watch a real gradient descent on KL divergence pull the student's distribution toward the teacher's. The dark-knowledge transfer that makes a 1B model behave like a frontier one — visible.

Anchored to 10-fine-tuning/distillation and the /ship/17 — synthetic data + distillation walkthrough.