Fine-Tune Training Data Builder (BYO Key)

Generate JSONL training pairs for fine-tuning your own LLM

Ad placeholder (leaderboard)

Fine-tuning lives or dies on its dataset, and assembling one by hand is slow. This tool bootstraps a dataset — describe the behaviour you want to teach and it generates diverse prompt/completion pairs formatted as OpenAI chat fine-tuning JSONL, using your own OpenAI or Anthropic key, entirely in your browser.

How it works

Choose a provider and model, paste your API key, describe the task, and optionally add a reference example, a system prompt for the eventual fine-tuned model, and the number of pairs you want. The tool asks the model to produce diverse, realistic examples — varying phrasing, length, and edge cases — and to return strict JSON. The response is parsed and shape-checked in the browser, then assembled into JSONL where each line is {"messages":[{system},{user},{assistant}]}. It is one direct request to the provider.

For Anthropic, the request includes the official direct-browser-access header so it works straight from the page.

Building a real dataset

  • Reference examples anchor the style — even one good pair raises quality sharply.
  • System prompt is baked into every JSONL line so training matches how you will actually call the model.
  • Batch and curate — generate small batches, delete the weak pairs, and stack the good ones.

Tips

  • Always read every pair before training; synthetic data introduces subtle errors that fine-tuning will faithfully memorise.
  • Mix in real, hand-written examples for the cases that matter most.
  • Keep the system prompt here identical to the one you will use at inference time, or the fine-tune will be mismatched.
Ad placeholder (rectangle)