Why does the chat template matter so much?

Local models are trained on a specific token format. Send raw text without the right INST tags or header tokens and the model behaves erratically — ignoring the system prompt or refusing to follow turns. The template aligns your input with training.

How do the three formats differ?

Llama 2 wraps the system prompt in SYS tags inside the first INST block. Llama 3 uses special header tokens with begin_of_text and eot_id markers. Mistral has no system role, so the convention is to prepend the system text to the user message inside INST.

Does Mistral really have no system role?

Mistral Instruct's official template has no dedicated system slot. The widely used convention is to put the system instruction at the start of the first user turn inside the INST block, which this tool does automatically.

Do I need an API key?

No. Formatting happens locally in your browser with no network call. Paste your messages, copy the template, and use it with any runtime.

Should I include the trailing assistant header for Llama 3?

Yes. The empty assistant header at the end is where generation begins. The tool adds it so the model knows to start its reply, which is required for correct behavior.

What is the Llama Prompt Formatter?

Takes a system and user message and wraps them in the exact chat template each local model expects — Llama 2's INST and SYS tags, Llama 3's header tokens, or Mistral's INST convention — so your prompt parses correctly. It runs free in your browser on Gera Tools, with nothing uploaded.

Llama Prompt Formatter

Name: Llama Prompt Formatter
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Get one useful tool a week

Like this tool? Enter your email and we'll send you one genuinely useful Gera tool a week — plus a link to come back to this one. No spam, one-click unsubscribe any time.

Llama prompt formatter

Hosted APIs hide the chat template from you, but when you run Llama or Mistral locally you have to format prompts yourself — and each family expects a different token structure. Get it wrong and the model ignores your system prompt or mangles the turns. The Llama prompt formatter wraps a system and user message in the exact template each model family was trained on.

How the three formats differ

Each model family uses a distinct set of special tokens to mark role boundaries. Sending a Llama 3 template to a Llama 2 checkpoint (or vice versa) causes the model to misinterpret turns and produce garbled or instruction-ignoring output.

Llama 2 format:

<s>[INST] <<SYS>>
{system message}
<</SYS>>

{user message} [/INST]

Llama 3 format:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{system message}<|eot_id|>
<|start_header_id|>user<|end_header_id|>
{user message}<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>

Mistral Instruct format (no system role — system prepended to user turn):

<s>[INST] {system message}

{user message} [/INST]

How it works

You pick a model family and enter a system message and a user message. The tool wraps them in the correct template, including all special tokens and delimiters. The trailing empty assistant header for Llama 3 is added automatically — it marks where generation should begin and is required for the model to start its reply. Everything runs locally — no API key, no network call — and you copy the result straight into your runtime, llama.cpp command, vLLM request body, or any tool that accepts a raw prompt.

Tips and notes

Match the template to the checkpoint — pick the family that matches the weights you are running; mixing templates degrades output noticeably.
Keep the special tokens intact — do not hand-edit <|eot_id|> or [/INST]; the model tokenizer relies on them to delimit turns.
Mistral’s system text is a convention — it lives inside the user turn, so keep it concise and place the most important instructions near the top.
For multi-turn conversations, extend the pattern by repeating [INST] user [/INST] assistant blocks for Llama 2, or additional header-token blocks for Llama 3.

Where to use the formatted output

The formatted prompt is consumed directly by model runtimes that accept raw text input:

llama.cpp — pass it via the --prompt flag or as the prompt field in the API
vLLM — use it as the prompt field in a /v1/completions request (not /v1/chat/completions, which applies the template automatically)
Ollama raw mode — send it as the raw: true prompt field to bypass Ollama’s own template layer
LM Studio — switch to raw mode in the playground and paste the formatted prompt directly

When using a chat completions API (including local servers that expose one), you normally supply the system and user message separately and let the server format them — the formatter is useful for systems that take raw text only.