What format does the file use?

It uses OpenAI's chat fine-tuning format: each line is a JSON object with a 'messages' array containing optional system, user, and assistant turns. This is the format the fine-tuning API expects for gpt-4o-mini and similar models.

How many examples do I need?

OpenAI requires at least 10 examples to start a fine-tuning job, and recommends 50–100 high-quality examples for a meaningful improvement. The builder shows your count so you can track progress toward those thresholds.

Why one JSON object per line?

JSONL (JSON Lines) puts one complete, independent JSON object on each line with no enclosing array. This lets training pipelines stream examples one at a time without loading the whole file into memory.

Is my data uploaded anywhere?

No. Everything is built and validated locally in your browser. Your examples are never uploaded, stored or logged, and the download is generated on your device.

What is the .jsonl Dataset Builder?

Free .jsonl dataset builder for LLM fine-tuning. Add prompt/completion pairs or chat messages arrays via a form, get live per-line validation against OpenAI's chat format, see your example count, and download a clean .jsonl file ready to upload. It runs free in your browser on Gera Tools, with nothing uploaded.

.jsonl Dataset Builder

Name: .jsonl Dataset Builder
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Build a clean fine-tuning dataset, line by line

Fine-tuning a model means feeding it well-formed training examples in JSONL format. This builder lets you add prompt/response examples through a simple form, validates each one as you type, and exports a clean .jsonl file with one valid JSON object per line — ready to upload.

How the chat format works

OpenAI’s fine-tuning API expects each line to be a JSON object containing a messages array, mirroring how you call the chat API at inference time:

{"messages":[{"role":"system","content":"You are a terse assistant."},{"role":"user","content":"Capital of France?"},{"role":"assistant","content":"Paris."}]}

The optional system message sets behaviour and should match what you will use in production.
The user message is the input, and the assistant message is the ideal output the model should learn to produce.
Each example is independent — there is no enclosing array and no commas between lines.

The builder assembles this structure for every example and escapes the JSON correctly, so you never hand-edit brittle quoting.

Tips for good training data

Be consistent. Use the same system prompt across examples that share the same task so the model learns one behaviour, not many.
Show, don’t tell. Demonstrate the exact style and length you want in the assistant responses rather than describing it.
Cover the edges. Include the tricky and ambiguous inputs you expect in production, not just the easy ones.
Keep it clean. Validate before uploading — a single malformed line can fail an entire fine-tuning job.

Why a builder is better than editing JSONL by hand

Hand-editing JSONL is error-prone in specific ways: a missing quote inside a content string, a stray comma after the last message object, or a newline inside a content value all produce a line that looks correct but fails JSON parsing. The builder enters each field through a form and serialises it programmatically, so the quoting and escaping is always correct.

It also enforces format requirements automatically. The user and assistant fields are flagged if empty, because a line without both produces a training example the API will reject. You see the running count of valid examples, which tells you whether you are approaching the recommended threshold.

Worked example: a customer-support fine-tune

Suppose you want a model to answer questions about a software product in a specific short-answer style. You would:

Set the system message once — for example: "You are a concise support agent for Acme App. Answer in two sentences or fewer."
Add a user message: "How do I reset my password?"
Add an assistant response: "Click Forgot Password on the login page and enter your email. You will receive a reset link within a few minutes."
Repeat for 50 to 100 distinct questions.

The resulting .jsonl file has one line per question-answer pair. Each line is self-contained, so the training pipeline can shuffle them without any dependencies between lines breaking.

Minimum example counts

OpenAI requires at least 10 examples to start a fine-tuning job. A meaningful improvement in tone or style usually needs 50 to 100 well-chosen examples; task-specific capability (like always returning JSON) may need 100 to 200 or more. The builder shows your running count so you know when you have crossed each threshold. Aim for variety: examples that are too similar to one another add little additional signal.

What to do after downloading

Once you have a clean .jsonl file, upload it via the OpenAI fine-tuning UI or API, attach it as a training file, choose your base model, and start the job. Validate the file with the companion JSONL Validator before uploading to catch any issues in datasets you have edited outside this builder.