demo
The simplest model that works — three ways
Generate noisy data. Then fit a line three ways: drag it yourself and watch MSE respond, press watch it learn to see gradient descent walk the line toward the optimum step-by-step, or jump straight to the closed-form OLS solution. Every supervised model is built on this same idea — minimize a loss — so seeing all three angles is the point.
What to look at
- The loss landscape on the right is the bowl the optimizer walks. Bright = low loss; dim = high. Press "watch it learn" to see the descent path get drawn live.
- The nudge hints under each slider tell you which direction the gradient says to move, and how far off you are. ✓ means you've reached the optimum.
- The learning rate η is the step size for gradient descent. Push it to 0.5 and watch GD overshoot and wobble; push it to 0.01 and watch it crawl. The Goldilocks zone is somewhere around 0.05–0.15.
- Both gradient descent and OLS land on the same line for this problem. GD generalizes to 100M-parameter neural nets; OLS doesn't. That's why every modern ML model is trained with the iterative method.
Anchored to 02-ml-fundamentals/classical-algorithms.
For non-quadratic losses where the closed form breaks down, see the gradient descent demo.