click · drag · type · learn
57 demos. 13 narrated walkthroughs. Zero theory.
Every demo is anchored to a single article and built around one mechanism. Click any card. The animated ones auto-play with voice narration; the interactive ones let you mess with the dials yourself.
The marquee demos
The four that earn the visit. Two are animated walkthroughs; all four are tier-A (real models or real algorithms running in your browser, not cartoons).
01
Inference Pipeline
The capstone. Question → tokens → embeddings → 12 layers → logits → sampling → next token → answer. Step through a real GPT-2 forward pass end to end.
02
Attention Inspector ▶ animated
Three modes: narrated walkthrough, interactive inspector, and a Composition view that drives one slider through pattern → diversity → entropy → semantic resolution. Real GPT-2 attention.
03
Diffusion Process ▶ animated
Voice-narrated forward + reverse diffusion on a 2D smiley, plus the full interactive sandbox with four shapes. The toy version of every image generator.
04
RAG Visualizer
Two modes: a 3-strategy comparison (dense / BM25 / hybrid) and a Composition view where top-k drives a wave through scores → retrieved → recall → cost-per-query.
All 57 demos
Foundations
The math and the building blocks every model rests on.
live
Linear Algebra Lab
Drag two vectors, watch dot product, projection, and angle update live. The math behind cosine similarity, attention scores, and embeddings.
live
Eigenvectors Lab
Spin a vector, apply a matrix, see which directions only scale instead of rotating. The math behind PCA, PageRank, and stability analysis.
live
Bayes Theorem Lab
Why a 99%-accurate medical test can be mostly wrong. Drag prevalence, sensitivity, and false-positive rate; see the posterior live on a 1000-person grid.
live
▶ animated
Gradient Descent
The math that trains every neural network. Drop a ball on a loss curve, watch it find the minimum. Animated explainer + full interactive sandbox.
live
Entropy & KL Divergence
Two distributions. Drag the bars. Watch entropy, cross-entropy, and KL divergence update live. The math behind every neural-network loss function.
live
Bias-Variance Tradeoff
Slide the polynomial degree from 1 to 15. Watch a straight line miss the curve, a degree-3 fit nail it, and a degree-15 fit chase noise. The most important plot in ML.
live
Confusion Matrix Lab
Slide the classifier threshold and watch precision, recall, F1, and the ROC curve update live. Why there is no single 'right' threshold.
live
▶ animated
K-Means Clustering
Narrated walkthrough of one full convergence (with voice narration), plus the interactive sandbox. The classic unsupervised algorithm in motion.
live
Linear Regression Lab
Drag a line through noisy data. Watch MSE update live. Click 'fit' for the closed-form OLS solution. The simplest supervised model.
live
Activation Functions Lab
ReLU, Leaky ReLU, GELU, SiLU, tanh, and sigmoid overlaid on one plot — including their derivatives, which is what backprop multiplies through.
live
▶ animated
Backpropagation Walkthrough
A tiny computational graph. Animated explainer with voice narration + interactive step-through. Backprop is not magic — it's calculus, applied mechanically.
live
Neural Network Trainer
A 2-layer MLP trains live on moons / circles / XOR / blobs. Watch the decision boundary deform every frame as SGD adjusts the weights.
live
Optimizer Race
SGD, momentum, RMSprop, and Adam race down the same loss landscape. The visceral answer to 'why is Adam the default everywhere?'
live
n-gram Language Model
Predict the next word the way NLP did it for 20 years before transformers — by counting. Pick a corpus, pick an n, watch the limits emerge.
live
▶ animated
RNN Unrolling
Animated side-by-side: vanishing vs exploding gradients on the same RNN cell, narrated. Plus the interactive sandbox with a real W_h matrix.
live
Perplexity Calculator
The number every LM paper opens with — computed live. Score any text against three n-gram models trained on different corpora; see why 'lower is better' only works inside a tokenizer family.
LLM Internals
How a transformer actually reads, attends, and predicts.
live
Tokenizer Surgery
Paste any text, see how BPE breaks it into tokens, compare GPT-2 / GPT-4 / GPT-4o, find out why your prompt is more expensive than you think.
live
Embedding Playground
Pick two phrases, see their cosine similarity. Watch synonyms cluster and antonyms drift in a 2D PCA scatter of real sentence embeddings.
live
Word Arithmetic
king − man + woman = ? Compose vector arithmetic over a curated 350-word vocabulary and watch the nearest neighbour shift live.
live
▶ animated ◆ composition
Attention Inspector
Three modes: narrated walkthrough, interactive inspector, and a Composition view that drives one slider through pattern → diversity → entropy → semantic resolution. Real GPT-2 attention.
live
Head Specialization Gallery
All 12 heads of one layer at once, in small multiples. See which head learned coreference, which one tracks punctuation, which one just smears.
live
▶ animated
Positional Encoding Lab
Sinusoidal vs learned vs RoPE vs ALiBi. Same sequence, four encodings, animated cursor sweeping through positions — see how each tells the model 'this token is 3rd from the start'.
live
Scaling Laws Calculator
Slide model size and training tokens, watch Chinchilla's loss curve respond. See exactly where GPT-3 went wrong (undertrained) and why LLaMA-3 trains 70B on 15T tokens.
live
▶ animated
MoE Routing Visualizer
Watch a Mixture-of-Experts router dispatch tokens, one at a time, to specialist experts. Lights up which experts fire and accumulates load. The trick that powers Mixtral, Grok, GPT-4 (rumored), DeepSeek-V3.
live
Reasoning vs Non-Reasoning
Standard LLM versus a reasoning model on hard problems. Watch test-time compute earn its tokens — the trade behind o-series, R1, and extended thinking.
live
Long Context Attention
Sliding window, sparse, and global-local attention masks visualized. Why O(N²) is the long-context wall and how modern models work around it.
live
KV Cache
Toggle the cache on and off, watch per-token compute explode. The single biggest optimization in modern inference engines.
live
Layer Norm
Slide a messy input distribution. Watch it get centered, normalized to unit variance, then rescaled by learnable γ and β. The reason every transformer block trains.
live · capstone
Inference Pipeline
The capstone. Question → tokens → embeddings → 12 layers → logits → sampling → next token → answer. Step through a real GPT-2 forward pass end to end.
Building with LLMs
Once you have a model: prompts, retrieval, tuning, agents.
live
Sampling Knobs
Drag temperature, top-p, top-k. Watch the next-token probability distribution reshape live on real GPT-2 logits.
live
Chain-of-Thought Effect
Same model, same problem, two prompts. See how 'let's think step by step' rescues the same model from getting it wrong on five classic reasoning tasks.
live
Few-shot vs Zero-shot
Pick a task. Slide from 0 to 3 examples in the prompt. Watch the same model go from useless to nailing it without changing the model at all.
live
Structured Outputs
Schema-enforced JSON, side by side with freeform prose. Live JSON validation. Why production AI runs on structured outputs, not 'please respond in JSON'.
live
▶ animated
Beam Search vs Sampling
Three decoding strategies side by side: greedy, beam search, and stochastic sampling. Watch the beam tree expand and prune step by step.
live
◆ composition
RAG Visualizer
Two modes: a 3-strategy comparison (dense / BM25 / hybrid) and a Composition view where top-k drives a wave through scores → retrieved → recall → cost-per-query.
live
Chunking Strategies
Paste a long document, switch between fixed-char / fixed-token / sentence / paragraph chunking, see chunk sizes and overlap regions.
live
▶ animated
Reranker Lab
Watch a cross-encoder reranker re-order results from initial dense retrieval — animated reorder shows the shuffle in motion. The single biggest quality lever in production RAG.
live
LoRA Lab
Visualize Low-Rank Adaptation: slide the rank, watch a small B·A factorization approximate a full weight update at 800× fewer parameters.
live
Quantization Lab
Watch FP32 weights cast down to INT8, INT4, and below. See memory drop 8× while reconstruction error stays manageable. The trick that runs 70B on a Pi.
live
RLHF Preference Simulator
Pick A or B on paired completions. Watch a reward model emerge from your preferences. The pattern behind RLHF, DPO, and every aligned chat model.
live
◆ composition
Distillation Lab
Real GPT-2 teacher logits, a learnable student, KL divergence as the loss. Includes a Composition view that drives temperature through entropy → gradient field → trajectory → cost-per-token.
live
▶ animated
Agent Trace Viewer
Step through agent loops, frame by frame, or hit play and watch them unfold. User → reasoning → tool calls → tool results → answer. Four hand-crafted traces.
live
Tool Use Builder
The JSON-schema dance behind function calling. Pick a query, watch the model emit a tool call, see the runtime execute it, see the answer come back.
live
▶ animated
Multi-Agent Orchestrator
Narrated fan-out → parallel work → fan-in walkthrough on a curated scenario, plus the interactive sandbox with three patterns. The pattern behind Claude's research mode.
live
Memory Systems
Working, episodic, semantic — the three layers of agent memory. Walk a conversation through time, watch what each layer remembers.
live
ReAct & Reflection
The Reason-Act-Observe loop, plus self-reflection. Watch an agent walk through reasoning, take actions, observe results — and sometimes catch its own mistakes.
live
▶ animated
Diffusion Process
Voice-narrated forward + reverse diffusion on a 2D smiley, plus the full interactive sandbox with four shapes. The toy version of every image generator.
live
ViT Patches
See how an image becomes a sequence of tokens. Each patch flattens to a vector and feeds a transformer just like text — same self-attention, different shape.
live
CLIP Zero-Shot Classifier
One shared 512-d space for images and text. Pick an image, watch a real CLIP-ViT-B/32 model rank captions; pick a caption, see which images match. The trick behind every reverse-search and image-conditioning stack.
Production
Shipping AI that stays shipped.
live
Cost & Latency Calculator
Pick a model, plug in your workload, see monthly cost and p50/p99 latency. Compare prompt-caching, batching, and speculative decoding side by side.
live
LLM-as-Judge
The eval pattern behind every fast iteration loop. Pick a sample, tune rubric weights, watch the winner change. The technique modern AI dev cycles run on.
live
Guardrails Lab
See input + output safety filters in action. Type any prompt, watch which guards fire on each layer of the defense.
live
▶ animated
Observability Trace
The waterfall view every production AI app needs. Press play and watch spans draw in real-time order. Find bottlenecks, spot retries, allocate cost.
live
Hallucination Mitigation
Same question, three responses — ungrounded, RAG-grounded, abstention. See where each one earns trust and where it doesn't.
live
Calibration Lab
Reliability diagrams + ECE on four model types. See why most models are overconfident and how temperature scaling fixes it.
live
Text-to-SQL Playground
Pick a natural-language question. See the generated SQL, then run it on a real (tiny) database. The application that proved LLMs could ship.
live
Browser Agent
Watch an agent navigate a web page step by step — observe the DOM, click, type, observe again. The pattern behind Computer Use and Operator.