01

stage · curriculum

Math Foundations

Math fluency, not mastery. Linear algebra is the language; calculus drives learning; probability quantifies uncertainty; information theory measures loss. Skip and you'll cargo-cult papers forever; spend a week and softmax stops being a black box.

4 articles

21 min to read

5 demos

4 books

if you only do one thing

Linear algebra is the language every operation in ML speaks. Read it once, then drag two vectors in the demo until dot products feel inevitable.

read Linear Algebra ▶ run Linear Algebra Lab

Articles in this stage

Stage 01 — Math Foundations

“ML is mostly linear algebra wearing a trench coat.”

You don’t need to be a mathematician. You need fluency in four things:

Linear algebra — the language of every operation in ML
Probability & statistics — what models are uncertain about and why
Calculus & optimization — how learning happens
Information theory — what “loss” actually measures

This stage covers each at the depth needed to read papers, debug models, and stop nodding politely when someone says “covariance matrix.”

Prerequisites

High-school algebra
A Python environment with NumPy

Learning ladder

Read in this order:

Linear algebra — vectors, matrices, dot products, eigendecomposition
Probability & statistics — distributions, expectation, MLE, Bayes
Calculus & optimization — derivatives, gradients, gradient descent
Information theory — entropy, cross-entropy, KL divergence

The order matters: probability uses linear algebra; calculus is the bridge to optimization; information theory ties it all to loss functions.

Minimum viable understanding (MVU)

Before moving to Stage 02 you should be able to:

Compute a dot product, matrix-vector product, and matrix-matrix product by hand on small examples.
Explain why softmax outputs are non-negative and sum to 1.
Write the gradient descent update rule from memory: θ ← θ − η · ∇L(θ).
Define entropy in plain English and compute it for a 2-outcome distribution.
Explain what “negative log-likelihood = cross-entropy” means.

Exercises

Dot product by hand. For a = [1,2,3] and b = [4,5,6], compute a·b. Then compute it in NumPy. Match.
Softmax intuition. Implement softmax on [1,2,3] and [101,102,103]. Same output? Why or why not?
Gradient descent on a parabola. Minimize f(x) = (x−3)² starting at x=0 with η=0.1. Plot x and f(x) over 50 steps.
Entropy by hand. Compute the entropy of a fair coin and a 90/10 coin. Which is higher?

Common pitfalls

Skipping it because “I’ll learn it as I go.” You won’t. You’ll cargo-cult papers and never quite trust your own debugging.
Going too deep too soon. You don’t need measure theory or category theory. Resist the rabbit hole.
Not coding the math. Math you’ve only read fades. Math you’ve implemented sticks.

Where this stage feeds

Stage 02 (ML fundamentals) uses linear algebra and calculus end-to-end.
Stage 03 (neural networks) is calculus + linear algebra at scale.
Stage 06 (transformers) lives or dies on how well you understand matrix products and softmax.

See also

Further reading

Books move slower than papers in this field — treat these as foundations, not replacements for the latest research. Real authors, real publishers, real editions. Free badges mark books with author-authorized full text online.