demo

Backpropagation, one node at a time

A single neuron with two inputs. Step forward to compute the output. Step backward to compute gradients via the chain rule. Every weight in every neural network on Earth gets its update this way.

What backprop actually is

The chain rule from calculus, applied mechanically, in reverse order of the forward pass. Each node multiplies its incoming gradient by its local derivative and passes the result along. That's it — that's the algorithm.

PyTorch's .backward() is exactly this loop, run automatically over a graph that autograd built while you ran the forward pass.

Anchored to 03-neural-networks/backpropagation.