demo

Same logits, different answers

Temperature, top-p, top-k — the three knobs every chat API hands you. Drag them on a real next-token distribution from GPT-2 and see exactly what each one does. The model's prediction never changes; only the funnel does.

What each knob actually does

  • Temperature divides every logit by T before softmax. T < 1 sharpens the distribution (the top tokens steal mass from the tail); T > 1 flattens it (everyone gets a chance); T → 0 collapses to greedy. Watch the entropy stat — it tracks how decisive the distribution is.
  • Top-p (nucleus) keeps the smallest set of candidates whose probabilities cumulatively reach p. Adaptive: a sharp distribution might keep only 1-2 tokens at p=0.9; a flat one might keep 30. This is usually the better diversity knob.
  • Top-k just keeps the top k by raw probability. Predictable, easy to reason about, but blind to the shape of the distribution.
  • The model never moves. Same logits every time. All "creativity" you tune in a chat API is post-processing on the same prediction.

Things to try

  1. Pick the capital france prompt, step 1. Crank temperature to 2.0. Watch the answer go feral while the model's actual best guess (·Paris) is still up there — it just isn't certain anymore.
  2. Pick the once upon prompt. Notice the entropy is already higher (~3 bits) — open-ended generation is uncertain by nature. Tune temperature down to 0.5: even creative writing becomes one obvious answer.
  3. Set top-p to 0.5 on any prompt. See how many tokens "fall out" on a sharp distribution (most of them) vs a flat one (only the long tail). That's why top-p generally beats top-k.

Anchored to 08-prompting/sampling-and-decoding from the learning path. Reuses the pre-computed logits from Inference Pipeline.