Question 1

Do I need to fine-tune, or is prompting or RAG enough?

Accepted Answer

Most teams should try prompting and retrieval-augmented generation first — they are cheaper and faster to iterate. Fine-tuning earns its keep when you need a consistent style or format, want to teach a narrow skill the base model lacks, or need to shrink prompts by baking instructions into the weights. It does not add fresh knowledge as reliably as RAG.

Question 2

What is the difference between LoRA and QLoRA?

Accepted Answer

LoRA freezes the base model and trains small low-rank adapter matrices, slashing memory and the number of trained parameters. QLoRA goes further by loading the frozen base in 4-bit quantisation, so you can fine-tune a 7B model on a single consumer GPU. QLoRA trades a little speed for dramatically lower VRAM, which is why it dominates hobbyist and small-team fine-tuning.

Question 3

How much data do I need?

Accepted Answer

For style and format adaptation, a few hundred to a few thousand high-quality instruction examples often suffice. Quality and consistency matter far more than volume — a thousand clean, on-task examples beat ten thousand noisy ones. Curate aggressively and hold out a small validation set to catch overfitting.

Question 4

What hardware does this require?

Accepted Answer

With QLoRA you can fine-tune a 7B–8B model on a single 16–24 GB GPU, including free Colab or Kaggle tiers for small runs. Larger models or full fine-tuning need multiple high-memory GPUs. The libraries auto-detect the GPU; CPU-only fine-tuning is impractical for anything but tiny experiments.

Question 5

Why merge adapters, and when should I keep them separate?

Accepted Answer

Merging folds the LoRA weights back into the base model so you ship a single standalone checkpoint, which is simplest for deployment and conversion to formats like GGUF. Keeping adapters separate lets you hot-swap multiple task-specific adapters on one base model in memory, which is ideal when you serve several fine-tunes at once.

How to Fine-Tune an LLM with Hugging Face

When fine-tuning is the right tool

Preparing the dataset

Configuring LoRA and QLoRA

Training, merging, and shipping