Perplexity · ai-explained

Try this — predict before you click

Score the same text against all three corpora at n=2. Predict: the corpus-matched model wins by 10–100× on perplexity. Wikipedia text under the Shakespeare model jumps to triple-digit perplexity because it's full of OOV bigrams.
Pick the OOD sample (e.g., the JSON code), score under Shakespeare. Look at the per-token surprisal chips. Predict: the token-level highlights cluster on punctuation and code syntax ({}[]:) because Shakespeare's bigrams never saw them. The high bits-per-token tail dominates the average.
Same text, slide n from 1 → 3. Predict: at n=1 (unigram), perplexity is high but stable across corpora — just word frequencies. At n=3, the corpus-matched model gets dramatically sharper while the off-corpus models actually get worse (more OOV trigrams). Higher n isn't free.
Edit a single word in the text to a rare-but-real word (e.g., replace "the" with "perspicacious"). Predict: the OOV badge fires for that token, the chip turns red, and the running average jumps. Even one out-of-vocab token tanks an n-gram model's perplexity.