LoRA Lab · ai-explained

The trick

Hu et al. (2021) noticed something surprising: when you fine-tune a large model, the weight updates have a low intrinsic rank. Most of the change in a layer can be captured by a low-rank matrix. So instead of training the full d × d update, train two rectangles B (d × r) and A (r × d) that multiply to approximate it.

The numbers

Full fine-tuning of a 7B model: ~14 GB of trainable parameters.
LoRA at r=8 on the same model: ~17 MB. 800× reduction.
And the resulting model performs nearly identically on most tasks.

Why it works

The singular value spectrum of fine-tuning updates is heavily front-loaded — the top few singular values dwarf the rest. Truncating after rank r captures most of the energy. This is LoRA's empirical foundation; the visualization above shows the spectrum for synthetic data with the same property.

Anchored to 10-fine-tuning/lora-and-qlora.

Fine-tune a 7B model in 17 MB

The trick

The numbers

Why it works