demo

Why Adam is the default

Four balls. Same starting point. Same learning rate. Different optimizers. Watch them race down the same 2D loss landscape and see why Adam wins on almost everything.

What you'll see

  • Rosenbrock valley — long curved canyon. Adam glides through; SGD bounces off the walls.
  • Saddle point — flat one way, steep the other. SGD gets stuck. Adam's adaptive scaling escapes.
  • Narrow ravine — the textbook case for momentum. Plain SGD oscillates wildly across the steep direction; momentum dampens it.
  • Wavy — small bumps everywhere. Momentum smashes through; SGD gets stuck in a local minimum.

Why Adam wins

Adam combines momentum (a running average of gradients) with RMSprop (a running average of squared gradients), plus bias correction for the early steps. It adapts the effective learning rate per parameter — small steps in steep directions, big steps in flat ones. The result: it Just Works on almost any landscape with almost any default learning rate. Hence: the default everywhere.

Anchored to 03-neural-networks/optimizers.