step 00 · build

Set the table

What we'll build, why this curriculum, and how to get your environment ready.

setup

There are good ways to learn how a transformer works. You can read the self-attention article and watch the animated walkthrough until the math feels obvious. You can read the scaling laws explainer and play with the calculator. All free. All here.

But there’s one experience this curriculum doesn’t yet give you: writing every line of an LLM yourself, from import torch to a working model that completes simple stories. That’s what this track is.

By the end you’ll have:

A learned BPE tokenizer trained on your own data
A 10M-parameter decoder-only transformer that you wrote, line by line
Training and evaluation code, a checkpoint that demonstrably learned something
Optional: the same model running in your browser via WebGPU

Nothing here will train the next ChatGPT. The 10M-parameter scale is roughly the size of a research toy from 2018 — small enough to train on a laptop in an evening, large enough that every architectural choice we make actually matters.

The point isn’t the model. The point is that after this, no part of a modern LLM is opaque to you. When you read a paper that says “we use a decoder-only transformer with 12 heads and rotary position embeddings,” you’ll know exactly which 80 lines of your code that sentence describes.

Who this is for

You’re comfortable in Python and you’ve seen PyTorch (or any deep-learning library) at least once. You don’t need to know what a transformer is — that’s what we’re building. You don’t need a GPU; we’ll fit on CPU at the small scale and use Colab’s free tier (or any single GPU) for the larger one.

If you’ve worked through Andrej Karpathy’s Zero to Hero videos, this track will feel familiar — the destination is similar. The format is intentionally different: code-along pages instead of pause-resume video, so you can implement at your own pace, and every step cross-references the visualizations on the rest of this site.

The 17-step path

The full curriculum is on the build index, but the shape:

Foundations (steps 01–03): math primer, BPE tokenizer, data pipeline
The model (steps 04–09): embeddings, attention, multi-head, transformer block, full GPT, training loop
Make it real (steps 10–16): sampling, scaling, fine-tuning, evaluation, browser inference, where-to-next

Each step is one article and one or two new files in your repo. Most steps are 15–25 minutes of reading and 10–30 minutes of hands-on coding. The whole track is roughly six hours of reading and four hours of hands-on, spread however you want. No step builds on a stub. When you start a step, every prerequisite is published.

What you’ll need

Three tools, nothing exotic:

Python 3.11 or later. Earlier versions work but the type hints I’ll use are easier on 3.11+.
A package manager. I’ll use uv throughout — fast, reproducible, modern. If you prefer pip + venv, the commands translate one-to-one; I’ll note where they differ.
An editor. VS Code, Cursor, vim, anything. The code we write fits comfortably in any of them.

That’s it. No GPU required for steps 01–10. For training the full 10M model you’ll want a GPU, but Colab’s free T4 is enough.

Set up the repo

Open a terminal. We’re going to make a project called tiny-llm.

# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create the project
uv init tiny-llm
cd tiny-llm

# Add PyTorch (CPU build — we'll cover GPU when we need it)
uv add torch

The uv init command creates a pyproject.toml, a .python-version, and a tiny hello.py you can ignore. We’ll lay out the real structure in a moment.

If you’re on pip instead, the equivalent is:

mkdir tiny-llm && cd tiny-llm
python -m venv .venv
source .venv/bin/activate    # on Windows: .venv\Scripts\activate
pip install torch

Either way, you should now be able to run:

uv run python -c "import torch; print(torch.__version__, torch.cuda.is_available())"

If you see something like 2.5.1 False (or True if you have a GPU), you’re set. The version may be different — anything ≥ 2.0 is fine.

A first sanity check

Before we go further, let’s verify the install with one tensor operation. Create a file:

# hello_torch.py
import torch

# A 3×3 matrix and a length-3 vector.
M = torch.tensor([[1.0, 2.0, 3.0],
                  [4.0, 5.0, 6.0],
                  [7.0, 8.0, 9.0]])
v = torch.tensor([1.0, 0.0, -1.0])

# Matrix-vector product. M @ v is the dot product of each row of M with v.
result = M @ v
print(result)
print(f"shape: {result.shape}")

Run it:

uv run python hello_torch.py

Expected output:

tensor([-2., -2., -2.])
shape: torch.Size([3])

If you got that, your environment is working. Every line of the model we’ll build is, at heart, a sequence of operations like this — just at higher dimensions and wrapped in nn.Module classes.

If you want to see what M @ v actually does at a geometric level — vectors, dot products, projections — open the Linear Algebra Lab in another tab. That visualization is the mental model we’ll lean on every time matrix multiplication shows up.

How each step is structured

Every step that follows uses the same shape, so you can predict the rhythm:

The big picture. A paragraph or two on what we’re building this step and why.
The math (when relevant). Just the bits you need to write the code, with a link to the deeper derivation in the theory articles.
Implementation. Code, with explanations between blocks. Files are tiny — usually one new module per step, occasionally two.
Sanity check. A small script you run that verifies the new code does what we said it would.
The cross-reference. A link to the matching interactive demo on this site. If your code’s output doesn’t match the demo’s behavior, something’s wrong — and the demo is a debugging tool, not just decoration.
What’s next. A one-liner pointing at the next step.

Code blocks always start with the filename as a comment so you know where the snippet lives:

# tiny_llm/attention.py
class CausalSelfAttention(nn.Module):
    ...

Files we add accumulate in the repo. By step 09 you’ll have tokenize.py, data.py, embed.py, attention.py, mha.py, block.py, gpt.py, train.py — eight files, none more than ~80 lines, that together form a working LLM trainer.

What I assume you won’t do

Two anti-patterns I want to call out, because both are common and both kill the value of this kind of walkthrough:

Skipping the sanity checks. Each step ends with a “run it and expect this output” pairing. Run it. The point isn’t validation that I wrote correct code (I did, modulo bugs you should report). The point is that you encounter the moment when your implementation produces a real number, and that number means something. If you skip the sanity checks, you’re reading a tutorial; if you do them, you’re building a model.
Copy-pasting whole files at once. I’m not your enemy here — I’d rather you reach the end. But if you paste attention.py whole, restart the article from the implementation section and type the names yourself. The muscle memory of writing q = self.W_q(x); k = self.W_k(x); v = self.W_v(x) is a different memory than reading those lines. The whole project is small enough that typing it doesn’t waste meaningful time.

OK — environment ready, repo bootstrapped, a sanity-check tensor multiplied. Time to start.