When should I fine-tune instead of just prompting?

Fine-tune when you need a consistent tone, a fixed output format, or a narrow task repeated at scale, and prompting plus a few examples is not reliable enough. For teaching new facts, retrieval-augmented generation is usually cheaper and easier to update than fine-tuning.

What format must the training data be in?

A JSONL file where every line is a JSON object with a messages key. Each messages array holds the conversation turns — typically a system instruction, a user message, and the ideal assistant reply. The assistant turn is what the model learns to produce.

How many examples do I need?

You can see improvement from as few as 50 to 100 high-quality, consistent examples. Quality and consistency matter far more than quantity. A few hundred clean, representative examples usually beats thousands of noisy ones.

How do I know if the fine-tune worked?

Hold back a validation split and watch training versus validation loss during the job. If validation loss stops falling while training loss keeps dropping, you are overfitting. Always test the tuned model on fresh, unseen prompts before trusting it.

Is my training data used to improve the base model?

No. Data you upload for fine-tuning is used only to train your own private model and is not used to train OpenAI's foundation models. The resulting fine-tuned model is private to your account.

How to Fine-Tune GPT-4o Mini on Your Own Data

When and why to fine-tune

Fine-tuning adapts a base model like GPT-4o Mini to your specific style, format, or task by training it on examples of the exact behavior you want. It shines when prompting alone cannot reliably produce a consistent tone or a fixed output shape, or when you run the same narrow task at high volume and want shorter prompts. For teaching the model new facts that change over time, prefer retrieval-augmented generation — fine-tuning bakes knowledge in and is harder to update.

How the workflow works

The data format is JSONL: one JSON object per line, each containing a messages array with system, user, and assistant turns. The assistant turn is the target the model learns to imitate. You upload the file with the Files API using purpose: "fine-tune", create a fine-tuning job pointing at that file id and the base model, then monitor training and validation loss until the job completes. The result is a private model name you call exactly like any other model.

The validator below checks JSONL training rows as you paste them — confirming each line is valid JSON, has a messages array, includes an assistant turn, and uses only allowed roles. Fix any flagged rows before you upload.

Tips for a clean fine-tune

Keep examples consistent: if your system prompt and format vary line to line, the model learns noise. Aim for at least 50-100 high-quality rows and hold back a validation split. Watch for overfitting — when validation loss rises while training loss falls, you have too many epochs or too little data. Test the tuned model on prompts it never saw before declaring success, and version your dataset so you can reproduce or extend the run later.