Question 1

What is LoRA?

Accepted Answer

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method that freezes a model's original weights and trains small, low-rank adapter matrices instead. This lets you adapt a huge model while updating well under 1 percent of its parameters.

Question 2

How does LoRA reduce trainable parameters so much?

Accepted Answer

Instead of updating a large weight matrix directly, LoRA approximates the change with the product of two much smaller matrices (the low-rank decomposition). Because those matrices are tiny, the number of trainable values drops by 99 percent or more.

Question 3

What is QLoRA?

Accepted Answer

QLoRA combines LoRA with 4-bit quantisation of the frozen base model. This shrinks memory use so dramatically that very large models can be fine-tuned on a single consumer or prosumer GPU.

Question 4

What are the advantages of LoRA over full fine-tuning?

Accepted Answer

LoRA needs far less GPU memory and disk space, trains faster, and produces small adapter files you can swap in and out without duplicating the whole model. Quality is usually close to full fine-tuning for most adaptation tasks.

LoRA — Low-Rank Adaptation (AI Glossary)

What is LoRA?

The low-rank idea

QLoRA: going even lighter

Why people use LoRA