What does the validator check?

Per line it confirms valid JSON, the presence of a messages array, allowed roles for the chosen provider, non-empty content, and at least one assistant turn. It also warns on lines whose rough token estimate is very large, and on datasets with fewer than 10 examples.

How accurate is the token-length warning?

It is a rough estimate (about four characters per token) used only to flag lines that may exceed a model's context window. For exact counts use a model-specific tokenizer; the warning is a safety net, not a precise measurement.

What is the difference between the OpenAI and Anthropic checks?

OpenAI fine-tuning uses a messages array with system, user and assistant roles. Anthropic's format also uses a messages array but only allows user and assistant roles, with any system instruction supplied separately. The validator enforces the right rule set per provider.

Is my dataset uploaded anywhere?

No. Parsing and validation run entirely in your browser. Nothing you upload or paste is sent to a server, stored or logged.

What is the .jsonl File Validator?

Free .jsonl validator for fine-tuning datasets. Upload or paste a .jsonl file and check every line for valid JSON, correct message roles, required fields, minimum example counts and rough token-length warnings against OpenAI or Anthropic schemas. It runs free in your browser on Gera Tools, with nothing uploaded.

.jsonl File Validator

Name: .jsonl File Validator
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Catch dataset errors before they fail your job

A single malformed line can sink an entire fine-tuning job after you have already uploaded and waited. This validator checks every line of your .jsonl file for valid JSON, correct structure and the right roles for your provider, and flags long examples — all locally, before you upload.

How validation works

JSONL stores one independent JSON object per line. The validator parses each line separately so a single bad line does not hide the rest, then applies schema rules for your chosen provider:

Valid JSON — the line must parse cleanly on its own.
A messages array — present and non-empty.
Allowed roles — OpenAI permits system, user, and assistant; Anthropic permits only user and assistant (system is passed separately).
Non-empty content and at least one assistant turn, since that is what the model learns to produce.

It also computes a rough token estimate per line to warn about examples that may blow past the model’s context window, and checks that the dataset has the minimum number of examples a fine-tuning job requires.

Tips

Fix the first error first. A stray comma or unescaped quote often explains several downstream failures.
Match roles to your provider. Moving an OpenAI dataset to Anthropic means lifting the system message out of the array.
Mind the long lines. A handful of oversized examples can be truncated silently during training — trim or split them.
Re-validate after every edit so a quick fix does not introduce a new problem.

Understanding the per-line report

Each line in the file gets a pass or fail badge. A passing line shows a brief summary of its message count and approximate token length. A failing line shows the specific error — for example: Invalid JSON on line 14, Missing messages array on line 23, or Unknown role "Human" on line 37. This granularity matters because a dataset of 200 lines might have only two bad ones, and catching them individually is far faster than trying to guess from a single upload error.

The token length warning uses a rough approximation of four characters per token. This is intentionally imprecise — exact tokenisation depends on the model’s tokenizer, which varies by provider and model family. The warning is a safety net to catch obviously oversized examples (a 10,000-character assistant response in one line), not a precise measurement. If you need exact token counts, run your file through a model-specific tokenizer after validating structure here.

OpenAI vs. Anthropic format differences

Feature	OpenAI	Anthropic
System message placement	Inside the `messages` array, role `system`	Separate `system` field at the top level
Allowed roles	`system`, `user`, `assistant`	`user`, `assistant`
Minimum examples	10	Varies by model
File format	`.jsonl`, one object per line	`.jsonl`, one object per line

If you built a dataset for OpenAI and want to use it with Anthropic, the structural change is straightforward: pull the system content out of the messages array and move it to a top-level system key on each line. The validator will flag any system role turns when you switch the schema selector to Anthropic.

Common mistakes and how to fix them

Trailing commas inside content strings. JSON does not allow trailing commas after the last element in an array or object. A common source of this is hand-editing a file that was originally exported as a JavaScript object literal, where trailing commas are valid.

Newlines inside content values. A content string that contains a literal newline (rather than the escape sequence \n) breaks the one-object-per-line invariant and causes the next line to fail parsing too. Replace literal newlines with \n escape sequences inside string values.

Mixed role capitalisation. User (capital U) is not the same as user in JSON. The API requires lowercase role names; the validator flags any that do not match the expected set exactly.