demo
Where your prompt actually goes
Type something. Watch BPE break it apart. Switch encoders to see how the same text becomes a different number of tokens — and a different bill — depending on which model reads it.
What you can see in this demo
- Whitespace is part of the token. Try
GPT GPT4 GPT-4— the leading space changes everything. Tokens with leading space render with a·marker so you can see the boundary. - Vocabulary size changes the bill. Switch from GPT-2's 50k vocab to GPT-4o's 200k. The same text often costs 20-40% fewer tokens at GPT-4o because longer common phrases become single tokens.
- Foreign scripts pay a tax. Try the Vietnamese example. The tokenizer's training distribution decides what gets a single token vs. broken into byte-level fragments. English averages 4-5 chars/token; Vietnamese can drop below 2.
- Hover any token to see its raw bytes, codepoints, and id. Some "tokens" are invisible whitespace; some are multi-byte UTF-8 fragments that don't even render as a complete character on their own.
Try this — predict before you click
-
Type
GPT-4 GPT4 GPT4o. Predict: the three terms tokenize into wildly different lengths even though they look similar. The hyphen vs no-hyphen and the trailingodecide which BPE merges fire. -
Switch encoder from
r50k_base(GPT-2) too200k_base(GPT-4o) on a paragraph of normal English. Predict: the o200k count drops 25–35%. The bigger vocab swallows more common phrases as single tokens. - Try the Vietnamese example on r50k_base. Predict: many characters become 2–3 byte-level tokens because the GPT-2 vocab barely saw Vietnamese. Switch to o200k_base — the count drops dramatically as the larger model picked up more diacritic-heavy languages.
-
Type
aaaaaaaaa(9 a's). Predict: it tokenizes as one or two tokens, not nine. BPE merges repetition aggressively. Now tryasdfgh. Predict: most or all of those characters become separate tokens — random sequences hit no merges.
Why this matters
Token count is the unit of cost, the unit of latency, and the unit of context. A 50-page PDF translated into Vietnamese might not fit in a context window where the English version does. Repetitive text compresses better than you expect; URLs and code compress worse. If you've ever wondered why your prompt is "more expensive than it should be" — this is usually why.
How it works
All three encoders run client-side via gpt-tokenizer — pure JavaScript, no WASM, no model
load. Encoding happens synchronously on every keystroke; for
typical input it's well under a frame.
The encoders are: r50k_base (GPT-2 / GPT-3), cl100k_base (GPT-3.5 / GPT-4), and o200k_base (GPT-4o). Claude and Llama use different
tokenizers but the pattern is identical: byte-level BPE, learned
merges, longest-match greedy encoding.
Anchored to 05-tokens-embeddings/tokenization
from the learning path.