Head Specialization Gallery

What you'll see

Diagonal-leaning heads. A bright diagonal line means the head pays attention to the current token only — an identity pass.
Previous-token heads. A diagonal one column to the left of center: the head's job is "what was the last word?" — useful for syntactic chaining.
Broadcasters. A bright vertical stripe in the first column means every query is dumping mass on the very first token. Often a "null" or "rest position" — see Anthropic's attention sink papers.
Semantic specialists. Bright cells far from the diagonal that line up with meaning, not position. Try the coreference example at higher layers — you'll see "she" reach across the sentence to "Alice".

Try this — predict before you click

Pick the coreference sentence ("The trophy doesn't fit…"). Slide layer from 0 to 11. Predict: at layer 0, all 12 heads look diagonal-ish (positional). By layer 11, you should see at least 3 distinct shapes — a broadcaster (column 0 dominates), a previous-token head (off-by-one diagonal), and a long-range semantic head.
At layer 11, find the head that lights up cells far from the diagonal. Click it for the focused view. Predict: the bright cells line up with meaningful word pairs ("it" → "trophy" or "suitcase"), not nearby words.
Compare layer 0 head 0 with layer 11 head 7 on the same sentence. Predict: layer 0 looks almost trivial (positional); layer 11 looks intricate. The depth of specialization is monotonic in layer index — early layers do mechanics, late layers do semantics.
Across all 12 heads in any layer, count how many are pure broadcasters (column 0 takes most of the mass). Predict: 1–3 per layer. These are "attention sinks" — heads that have nothing useful to do for some tokens, so they dump probability on position 0.

The pattern hint

The small label on each thumbnail (broadcaster, diagonal-leaning, etc.) is a one-line heuristic classification based on where mass concentrates. It's a starting point, not a category — most heads are mixtures.

Why this matters

Understanding head specialization is the entry point to mechanistic interpretability. If you can name what a head does, you can sometimes steer the model by intervening on it. Anthropic's circuits research, Neel Nanda's TransformerLens, and the entire field of interpretability starts here: looking at a head and asking "what shape did this learn?"

Anchored to 06-transformers/multi-head-attention from the learning path. Reuses the data that powers the Attention Inspector.

Twelve heads, twelve specialists

What you'll see

Try this — predict before you click

The pattern hint

Why this matters