demo · 03-neural-networks

Dropout — regularisation by random silence

Slide the dropout rate and watch neurons go dark at random on every forward pass. Toggle training vs inference: in inference all neurons fire and weights scale by (1−p) to keep expected activations equal. Hit auto-batch to see a new mask every 600 ms — the ensemble intuition in motion.

Why it matters

Dropout is the canonical regularisation trick for neural networks: during training, each neuron is silenced with probability p, forcing the network to distribute knowledge across many pathways rather than memorising through a single route. At inference the full network fires but weights are scaled — this is equivalent to averaging 2ⁿ subnetworks, all sharing parameters. Understanding dropout unlocks the intuition for modern variants: DropPath in ViTs, attention dropout in transformers, and stochastic depth.

The math

Training:   ãᵢ = aᵢ · Bernoulli(1−p) / (1−p)   # inverted dropout
Inference:  ãᵢ = aᵢ                               # no mask, no scale

Ensemble interpretation:
  2ⁿ subnetworks (one per mask pattern)
  all share weights — averaged at inference

Anchored to 03-neural-networks/regularization-techniques.