Optimizer Race · ai-explained

What you'll see

Rosenbrock valley — long curved canyon. Adam glides through; SGD bounces off the walls.
Saddle point — flat one way, steep the other. SGD gets stuck. Adam's adaptive scaling escapes.
Narrow ravine — the textbook case for momentum. Plain SGD oscillates wildly across the steep direction; momentum dampens it.
Wavy — small bumps everywhere. Momentum smashes through; SGD gets stuck in a local minimum.

Why Adam wins

Adam combines momentum (a running average of gradients) with RMSprop (a running average of squared gradients), plus bias correction for the early steps. It adapts the effective learning rate per parameter — small steps in steep directions, big steps in flat ones. The result: it Just Works on almost any landscape with almost any default learning rate. Hence: the default everywhere.

Anchored to 03-neural-networks/optimizers.

Why Adam is the default

What you'll see

Why Adam wins