demo
Same logits, different answers
Temperature, top-p, top-k — the three knobs every chat API hands you. Drag them on a real next-token distribution from GPT-2 and see exactly what each one does. The model's prediction never changes; only the funnel does.
What each knob actually does
- Temperature divides every logit by
Tbefore softmax.T < 1sharpens the distribution (the top tokens steal mass from the tail);T > 1flattens it (everyone gets a chance);T → 0collapses to greedy. Watch the entropy stat — it tracks how decisive the distribution is. - Top-p (nucleus) keeps the smallest set of
candidates whose probabilities cumulatively reach
p. Adaptive: a sharp distribution might keep only 1-2 tokens atp=0.9; a flat one might keep 30. This is usually the better diversity knob. - Top-k just keeps the top k by raw probability. Predictable, easy to reason about, but blind to the shape of the distribution.
- The model never moves. Same logits every time. All "creativity" you tune in a chat API is post-processing on the same prediction.
Things to try
-
Pick the
capital franceprompt, step 1. Crank temperature to 2.0. Watch the answer go feral while the model's actual best guess (·Paris) is still up there — it just isn't certain anymore. -
Pick the
once uponprompt. Notice the entropy is already higher (~3 bits) — open-ended generation is uncertain by nature. Tune temperature down to 0.5: even creative writing becomes one obvious answer. - Set top-p to 0.5 on any prompt. See how many tokens "fall out" on a sharp distribution (most of them) vs a flat one (only the long tail). That's why top-p generally beats top-k.
Anchored to 08-prompting/sampling-and-decoding
from the learning path. Reuses the pre-computed logits from Inference Pipeline.