demo · 03-neural-networks
Dropout — regularisation by random silence
Slide the dropout rate and watch neurons go dark at random on every forward pass. Toggle training vs inference: in inference all neurons fire and weights scale by (1−p) to keep expected activations equal. Hit auto-batch to see a new mask every 600 ms — the ensemble intuition in motion.
Why it matters
Dropout is the canonical regularisation trick for neural networks: during training, each neuron is silenced with probability p, forcing the network to distribute knowledge across many pathways rather than memorising through a single route. At inference the full network fires but weights are scaled — this is equivalent to averaging 2ⁿ subnetworks, all sharing parameters. Understanding dropout unlocks the intuition for modern variants: DropPath in ViTs, attention dropout in transformers, and stochastic depth.
The math
Training: ãᵢ = aᵢ · Bernoulli(1−p) / (1−p) # inverted dropout
Inference: ãᵢ = aᵢ # no mask, no scale
Ensemble interpretation:
2ⁿ subnetworks (one per mask pattern)
all share weights — averaged at inference
Anchored to 03-neural-networks/regularization-techniques.