demo · 07-modern-llms

Code vs prose — per-token prediction confidence

Every token in four examples — Python, SQL, prose fiction, news — is colored by how confident a language model is about predicting it. Green means near-certain, red means high entropy. Compare the strips and the average confidence bars. The grammar of code eliminates wrong answers that prose leaves wide open.

Why it matters

The formal grammar of a programming language acts as a hard constraint on the next-token distribution. After for i in, only a range expression is legal. After def foo(, a parameter list must follow. English has no such constraints — near-infinite continuations are grammatically valid. Lower per-token entropy means fewer random errors per line of output, which is the structural reason LLMs write better code than prose at the same model scale.

The math

H(token) = −Σ p(w) log₂ p(w)    # per-token entropy (bits)

confidence = p(predicted token)   # top-1 probability

avg H(code) ≪ avg H(prose)        # grammar collapses options

Anchored to 07-modern-llms/why-llms-excel-at-code.