Code vs Prose Entropy · ai-explained

Why it matters

The formal grammar of a programming language acts as a hard constraint on the next-token distribution. After for i in, only a range expression is legal. After def foo(, a parameter list must follow. English has no such constraints — near-infinite continuations are grammatically valid. Lower per-token entropy means fewer random errors per line of output, which is the structural reason LLMs write better code than prose at the same model scale.

The math

H(token) = −Σ p(w) log₂ p(w)    # per-token entropy (bits)

confidence = p(predicted token)   # top-1 probability

avg H(code) ≪ avg H(prose)        # grammar collapses options

Anchored to 07-modern-llms/why-llms-excel-at-code.

Code vs prose — per-token prediction confidence

Why it matters

The math