demo · 03-neural-networks

CNN filters — the sliding dot product

Pick an image and a kernel. A 3×3 window slides across the 8×8 input one cell at a time — the convolution math is written out at each position. Toggle ReLU to clamp negatives to zero and see how each filter detects edges, sharpens, or blurs. The visual proof that convolution is just repeated dot products.

Why it matters

Convolutional layers are the feature detectors of vision models. The same 3×3 weight matrix slides over every position, sharing parameters and reducing them from O(HW) to O(k²). Early layers detect edges, middle layers detect textures, deep layers detect object parts — the hierarchy emerges from stacking convolutions, not from designing the features. ViTs replaced convolutions with patches, but the intuition carries forward: local structure first, global reasoning later.

The math

(I * K)[i,j] = Σₘ Σₙ I[i+m, j+n] · K[m,n]   # convolution

ReLU(x) = max(0, x)                           # non-linearity

Output size = (H − k + 1) × (W − k + 1)      # no padding

Anchored to 03-neural-networks/architectures-cnn-rnn.