Zero-Shot vs Few-Shot vs Fine-Tuning: When to Use Each

Three ways to get AI to do new tasks, picked by experts

Ad placeholder (leaderboard)

Three ways to adapt a model

Out of the box, a large language model is a generalist. To make it do your specific task well, you have three escalating options. Zero-shot asks the model to perform a task with nothing but an instruction. Few-shot (a form of in-context learning) adds a few worked examples to the prompt to show the pattern. Fine-tuning goes further, actually adjusting the model’s weights by training it on your examples. They differ sharply in effort, cost, and when each one wins — and the right answer is almost always “use the simplest method that hits your accuracy bar.”

Zero-shot: just ask

In zero-shot prompting you give the model an instruction and the task, with no examples: “Classify this review as positive, negative, or neutral.” Modern instruction-tuned models are surprisingly good at this for common tasks. It is the cheapest and fastest option — the shortest possible prompt — and should always be your first try. Zero-shot struggles when the task is unusual, the output format is strict, or the model keeps misinterpreting what you want.

Few-shot: teach by example

Few-shot prompting includes a small set of input-output pairs before the real input, demonstrating exactly the behaviour and format you expect. Two to five well-chosen, consistent examples often dramatically improve accuracy on harder or non-obvious tasks, and they pin down output structure (such as a specific JSON shape) far better than a description alone. The cost is prompt length: every example is tokens you pay for on every call and that consume the context window. Few-shot is the sweet spot for tasks that zero-shot gets almost right.

Fine-tuning: bake it in

Fine-tuning continues training a base model on many of your examples, embedding a behaviour, tone, or format into its weights. Once fine-tuned, the model needs only a short prompt — no examples — so per-request cost and latency drop, which pays off at high volume. It is the right tool when you need consistent behaviour across many requests, or when good few-shot prompts would simply be too long. Its weaknesses: it requires a quality dataset and upfront effort, and it is a poor way to inject facts that change often — for fresh or changing knowledge, use retrieval-augmented generation instead.

A quick decision guide

Start with zero-shot. If the output is inconsistent or the task is unusual, add a few few-shot examples — this resolves most cases. Move to fine-tuning only when you need stable behaviour at scale, when prompts have grown too long and costly, or when latency demands a minimal prompt. If the problem is missing or changing facts rather than behaviour, none of these is the answer — reach for RAG. And remember you can layer them: fine-tune for base behaviour, then still supply retrieved context or a short example at request time.

Ad placeholder (rectangle)