demo

When meaning becomes math

The most famous result in NLP: subtract a "man" vector from "king", add a "woman" vector, and the closest word in the vocab is "queen". Build any expression and watch the geometry.

Why this is wild

Nobody told the embedding model that "king" and "queen" share something that "man" and "woman" also share. The model just learned to predict words from their context — and from billions of co-occurrences a gender axis emerged on its own, pointing in roughly the same direction across many word pairs. Same for "capital of" axes, "young/adult" axes, "verb tense" axes.

This is the punchline of distributional semantics: meaning shows up as structure in vector space, learned implicitly from word context. It's the reason embeddings work at all.

Things to try

  1. France − Paris + Tokyo → expect Japan. That's the country-capital axis.
  2. running − run + walk → expect walking. That's the verb-tense axis.
  3. better − good + bad → expect worse. Comparative axis.
  4. puppy − dog + cat → expect kitten. Young/adult axis.
  5. Try one that shouldn't work — e.g. Python − code + cooking. The result is whatever's nearest in the training distribution; sometimes it's nonsensical. That's a useful failure mode to see.

Caveats

The classic Word2Vec demos used a model trained specifically on word-level co-occurrence. We use sentence-transformer embeddings of single words for portability — clean analogies still work, but a few that worked on Word2Vec may not here. The geometry is slightly different. The lesson is the same.

Anchored to 05-tokens-embeddings/static-embeddings from the learning path.