Llama prompt formatter
Hosted APIs hide the chat template from you, but when you run Llama or Mistral locally you have to format prompts yourself — and each family expects a different token structure. Get it wrong and the model ignores your system prompt or mangles the turns. The Llama prompt formatter wraps a system and user message in the exact template each model family was trained on.
How it works
You pick a model family and enter a system message and a user message. For
Llama 2, the tool emits <s>[INST] <<SYS>>…<</SYS>> {user} [/INST]. For
Llama 3, it uses the header-token format with <|begin_of_text|>,
<|start_header_id|> role markers, <|eot_id|> separators, and a trailing empty
assistant header where generation begins. For Mistral, which has no system
role, it prepends the system text to the user message inside a single
[INST] … [/INST] block. Everything runs locally — no API key, no network call —
and you copy the result straight into your runtime.
Tips and notes
- Match the template to the checkpoint. A Llama 3 template fed to a Llama 2 model (or vice versa) will degrade output — pick the family that matches your weights.
- Keep the special tokens intact. Don’t hand-edit
<|eot_id|>or[/INST]; the model relies on them to delimit turns. - Mistral’s system text is a convention. It lives inside the user turn, so keep it concise — there’s no separate slot protecting it.
- The trailing assistant header is required for Llama 3. Leaving it off means the model has no signal to start replying.