02

stage · curriculum

ML Fundamentals

Pre-deep-learning ML — confusion matrices, train/val/test, bias-variance, why a 99%-accurate model can be useless. Modern AI is mostly classical ML wisdom applied to LLMs; this stage gives you the wisdom.

6 articles
23 min to read
4 demos
5 books
if you only do one thing

If you can't measure 'good,' you can't ship. Get eval discipline right here, before any LLM enters the picture.

Articles in this stage

  1. 01 Classical ML Algorithms
  2. 02 Evaluation & Metrics
  3. 03 Loss Functions & Optimization
  4. 04 Regularization & Generalization
  5. 05 Supervised Learning
  6. 06 Unsupervised Learning

Stage 02 — ML Fundamentals

Before deep learning, there was just machine learning: take features, define a loss, optimize parameters, evaluate. Modern AI is built on these primitives, and skipping them produces engineers who can fine-tune a 70B model but can’t read a confusion matrix.

Prerequisites

  • Stage 01 (linear algebra, probability, calculus)

Learning ladder

  1. Supervised learning — regression, classification, the train/val/test split
  2. Unsupervised learning — clustering, dim reduction, density estimation
  3. Loss functions & optimization — MSE, cross-entropy, SGD/Adam
  4. Evaluation & metrics — accuracy, precision/recall, F1, AUC, calibration
  5. Regularization & generalization — bias/variance, L1/L2, early stopping
  6. Classical algorithms — linear/logistic regression, trees, ensembles, kNN, SVMs

MVU

You can:

  • State the bias–variance tradeoff in one sentence
  • Choose between accuracy, F1, and AUC for a given problem and defend it
  • Explain why a 99% accurate model can be useless (class imbalance)
  • Describe the difference between training, validation, and test sets — and what data leakage means

Exercise

Train a logistic regression on the UCI Adult dataset (predict income > $50k). Compute precision, recall, F1, and AUC by hand from a confusion matrix. Compare with scikit-learn’s classification_report.

Why classical ML still matters

  • Tabular data. Gradient-boosted trees (XGBoost, LightGBM, CatBoost) still beat deep nets on most tabular problems.
  • Baselines. Always start with logistic regression / a tree. If a complex model can’t beat it, you have a feature problem, not a model problem.
  • Interpretability. Linear models are inspectable; trees produce rules. Sometimes that matters more than 2 extra points of accuracy.
  • Speed and cost. A scikit-learn model trains in seconds, runs on CPU, and costs ~nothing to serve.

See also

Further reading

Books move slower than papers in this field — treat these as foundations, not replacements for the latest research. Real authors, real publishers, real editions. Free badges mark books with author-authorized full text online.

  1. An Introduction to Statistical Learning (with applications in Python) coverfree

    An Introduction to Statistical Learning (with applications in Python)

    Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, Jonathan Taylor

    Springer, Python ed., 2023

    The most accessible bridge from stats to ML. Free PDF online.

  2. Pattern Recognition and Machine Learning coverfree

    Pattern Recognition and Machine Learning

    Christopher M. Bishop

    Springer, 2006

    The deep theoretical grounding on linear models, kernels, graphical models.

  3. Probabilistic Machine Learning: An Introduction coverfree

    Probabilistic Machine Learning: An Introduction

    Kevin P. Murphy

    MIT Press, 2022

    The modern, broader successor to Bishop. Free PDF online.

  4. The Hundred-Page Machine Learning Book cover

    The Hundred-Page Machine Learning Book

    Andriy Burkov

    self-published, 2019

    A very compact survey, useful as orientation.