demo

Many experts, only a few firing per token

Mixture-of-Experts: total params can be enormous, but only a few experts activate per token. Watch the router pick top-K experts for each word.

Anchored to 07-modern-llms/mixture-of-experts.