Question 1

What is LoRA in machine learning?

Accepted Answer

LoRA stands for Low-Rank Adaptation. It is a parameter-efficient fine-tuning method that freezes a model's original weights and learns a small pair of low-rank matrices that represent the weight update. Because only those small matrices are trained, you adapt a huge model while updating a tiny fraction of its parameters.

Question 2

What do the rank (r) and alpha hyperparameters do?

Accepted Answer

The rank r sets the size of the low-rank matrices and therefore how much capacity the adapter has — higher r can learn more but uses more memory and risks overfitting. Alpha is a scaling factor applied to the LoRA update; the effective scale is roughly alpha divided by r, so the two are usually tuned together. Common starting points are r of 8 to 16 with alpha around 16 to 32.

Question 3

How much does LoRA reduce trainable parameters?

Accepted Answer

LoRA typically reduces trainable parameters by 99% or more compared with full fine-tuning, because the low-rank matrices are tiny relative to the frozen weight matrices they adapt. This slashes optimiser memory and lets you fine-tune large models on a single consumer or workstation GPU, especially when combined with quantisation as in QLoRA.

Question 4

What is the difference between LoRA and QLoRA?

Accepted Answer

LoRA freezes the base weights at full precision and trains low-rank adapters. QLoRA goes further by quantising the frozen base model to 4-bit, dramatically cutting memory, then training LoRA adapters on top in higher precision. QLoRA makes it possible to fine-tune very large models on a single GPU with little quality loss.

What Is LoRA? Low-Rank Adaptation for Efficient LLM Fine-Tuning

The core idea

How the maths works (gently)

The key hyperparameters: r and alpha

Why it is so efficient

Practical benefits and trade-offs