Dropout Visualizer · ai-explained

Why it matters

Dropout is the canonical regularisation trick for neural networks: during training, each neuron is silenced with probability p, forcing the network to distribute knowledge across many pathways rather than memorising through a single route. At inference the full network fires but weights are scaled — this is equivalent to averaging 2ⁿ subnetworks, all sharing parameters. Understanding dropout unlocks the intuition for modern variants: DropPath in ViTs, attention dropout in transformers, and stochastic depth.

The math

Training:   ãᵢ = aᵢ · Bernoulli(1−p) / (1−p)   # inverted dropout
Inference:  ãᵢ = aᵢ                               # no mask, no scale

Ensemble interpretation:
  2ⁿ subnetworks (one per mask pattern)
  all share weights — averaged at inference

Anchored to 03-neural-networks/regularization-techniques.

Dropout — regularisation by random silence

Why it matters

The math