demo
When meaning becomes math
The most famous result in NLP: subtract a "man" vector from "king", add a "woman" vector, and the closest word in the vocab is "queen". Build any expression and watch the geometry.
Why this is wild
Nobody told the embedding model that "king" and "queen" share something that "man" and "woman" also share. The model just learned to predict words from their context — and from billions of co-occurrences a gender axis emerged on its own, pointing in roughly the same direction across many word pairs. Same for "capital of" axes, "young/adult" axes, "verb tense" axes.
This is the punchline of distributional semantics: meaning shows up as structure in vector space, learned implicitly from word context. It's the reason embeddings work at all.
Things to try
-
France − Paris + Tokyo→ expectJapan. That's the country-capital axis. -
running − run + walk→ expectwalking. That's the verb-tense axis. -
better − good + bad→ expectworse. Comparative axis. -
puppy − dog + cat→ expectkitten. Young/adult axis. -
Try one that shouldn't work — e.g.
Python − code + cooking. The result is whatever's nearest in the training distribution; sometimes it's nonsensical. That's a useful failure mode to see.
Caveats
The classic Word2Vec demos used a model trained specifically on word-level co-occurrence. We use sentence-transformer embeddings of single words for portability — clean analogies still work, but a few that worked on Word2Vec may not here. The geometry is slightly different. The lesson is the same.
Anchored to 05-tokens-embeddings/static-embeddings
from the learning path.