Layer Norm · ai-explained

Try this — predict before you click

Drag scale from 1 to 5 with γ = 1, β = 0. Predict: the raw strip's std bar grows to ~5, the normalized strip's std stays at exactly 1.0 (that's the whole point), and the rescaled strip matches the normalized one because γ = 1.
Same setup but drag γ to 0. Predict: the rescaled output collapses to a flat line at β. This is the failure mode — γ encodes "how much variance to put back in," and γ = 0 zeroes the layer. Real models initialize γ ≈ 1 and rarely let it drift far from there.
Drag drift to 3 and γ = 1, β = 0. Predict: raw mean is ~3, normalized mean is exactly 0, rescaled mean equals β. LayerNorm doesn't care what the input mean is — it always centers.
Crank β to −2 with γ = 1. Predict: the rescaled distribution shifts down by 2 regardless of input drift. β is a learnable bias that the gradient can move; you've just simulated "this layer wants its outputs centered at -2."