03

stage · curriculum

Neural Networks

A neural network is `Wx + b` plus non-linearities, trained with gradient descent. Hand-derive backprop through a 2-layer MLP once and every architecture downstream becomes a re-arrangement of the same primitives.

6 articles
21 min to read
4 demos
5 books
if you only do one thing

Backprop is the engine of every neural network. Hand-derive it once for a 2-layer MLP and you'll never be confused by training again.

Articles in this stage

  1. 01 Activations & Initialization
  2. 02 Architectures: CNNs and RNNs
  3. 03 Backpropagation
  4. 04 Optimizers
  5. 05 Perceptrons & MLPs
  6. 06 Regularization Techniques

Stage 03 — Neural Networks

Stack a few layers of Wx + b interleaved with non-linearities, train with gradient descent. That’s a neural network. Everything modern — transformers, diffusion, LLMs — is built on this.

Prerequisites

  • Stage 01 (calculus, linear algebra)
  • Stage 02 (loss functions, train/val/test)

Learning ladder

  1. Perceptrons & MLPs — the universal approximator
  2. Backpropagation — chain rule + autograd
  3. Activations & initialization — what makes networks trainable
  4. Optimizers — SGD → Adam → AdamW → modern variants
  5. Regularization techniques — dropout, normalization, weight decay
  6. Architectures: CNN & RNN — convolutional and recurrent nets

MVU

You can:

  • Hand-derive backprop through a 2-layer MLP
  • Explain why we use ReLU instead of sigmoid in deep nets
  • Pick a sensible learning rate for a new model
  • Diagnose a training run from loss curves alone

Exercise

Train an MLP on MNIST in raw PyTorch (no nn.Sequential, no nn.Linear — write everything from torch.tensor and torch.matmul). Hit >97% test accuracy. Then add BatchNorm and dropout; observe the difference.

What you’ll build by the end

A clear mental model for every component of a transformer block (Stage 06): linear layer, activation, normalization, residual connection. The transformer is just an MLP with attention layers spliced in.

See also

Further reading

Books move slower than papers in this field — treat these as foundations, not replacements for the latest research. Real authors, real publishers, real editions. Free badges mark books with author-authorized full text online.

  1. ★ start here
    Deep Learning coverfree

    Deep Learning

    Ian Goodfellow, Yoshua Bengio, Aaron Courville

    MIT Press, 2016

    The foundational reference. Free online.

  2. free

    Dive into Deep Learning

    Aston Zhang, Zachary Lipton, Mu Li, Alex Smola

    Cambridge University Press, 2023

    Code-first companion to Goodfellow, multi-framework, continuously updated.

  3. free

    Neural Networks and Deep Learning

    Michael Nielsen

    self-published online, 2015

    The clearest first-principles intro to backprop. Free online.

  4. Build a Large Language Model From Scratch cover

    Build a Large Language Model From Scratch

    Sebastian Raschka

    Manning, 2024

    Implements GPT-2 end-to-end in PyTorch, layer by layer.

  5. Deep Learning with Python cover

    Deep Learning with Python

    François Chollet

    Manning, 2nd ed., 2021

    Keras-flavored, but the conceptual material is excellent.