Catch dataset errors before they fail your job
A single malformed line can sink an entire fine-tuning job after you have
already uploaded and waited. This validator checks every line of your
.jsonl file for valid JSON, correct structure and the right roles for your
provider, and flags long examples — all locally, before you upload.
How validation works
JSONL stores one independent JSON object per line. The validator parses each line separately so a single bad line does not hide the rest, then applies schema rules for your chosen provider:
- Valid JSON — the line must parse cleanly on its own.
- A
messagesarray — present and non-empty. - Allowed roles — OpenAI permits
system,user, andassistant; Anthropic permits onlyuserandassistant(system is passed separately). - Non-empty content and at least one assistant turn, since that is what the model learns to produce.
It also computes a rough token estimate per line to warn about examples that may blow past the model’s context window, and checks that the dataset has the minimum number of examples a fine-tuning job requires.
Tips
- Fix the first error first. A stray comma or unescaped quote often explains several downstream failures.
- Match roles to your provider. Moving an OpenAI dataset to Anthropic means lifting the system message out of the array.
- Mind the long lines. A handful of oversized examples can be truncated silently during training — trim or split them.
- Re-validate after every edit so a quick fix does not introduce a new problem.