Question 1

What does LoRA stand for?

Accepted Answer

LoRA stands for Low-Rank Adaptation. It is a parameter-efficient fine-tuning method that freezes the original model weights and trains a small pair of low-rank matrices that are added to selected layers, so you adapt the model without updating its billions of original parameters.

Question 2

How is LoRA different from full fine-tuning?

Accepted Answer

Full fine-tuning updates every weight in the model, which needs large amounts of GPU memory and produces a full-size copy of the model for each task. LoRA freezes the base weights and trains only small adapter matrices, cutting trainable parameters by orders of magnitude while keeping most of the quality on many tasks.

Question 3

What is QLoRA?

Accepted Answer

QLoRA combines LoRA with quantization: the frozen base model is loaded in 4-bit precision to save memory, and the LoRA adapters are trained on top of it. This makes it possible to fine-tune very large models on a single consumer or modest GPU that could never hold them in full precision.

Question 4

When should I not use LoRA?

Accepted Answer

If you need the model to absorb a large amount of genuinely new knowledge or radically change its behaviour, full fine-tuning or continued pretraining may work better. For style adaptation, task specialisation, and domain tone — the most common needs — LoRA usually delivers most of the benefit at a fraction of the cost.

LoRA Explained: Low-Rank Adaptation for Fine-Tuning LLMs

The problem LoRA solves

How LoRA works

LoRA vs full fine-tuning

QLoRA and the rest of the PEFT family

When to reach for LoRA