Llama Prompt Formatter

Format prompts correctly for Llama 2, Llama 3, and Mistral chat templates

Ad placeholder (leaderboard)

Llama prompt formatter

Hosted APIs hide the chat template from you, but when you run Llama or Mistral locally you have to format prompts yourself — and each family expects a different token structure. Get it wrong and the model ignores your system prompt or mangles the turns. The Llama prompt formatter wraps a system and user message in the exact template each model family was trained on.

How it works

You pick a model family and enter a system message and a user message. For Llama 2, the tool emits <s>[INST] <<SYS>>…<</SYS>> {user} [/INST]. For Llama 3, it uses the header-token format with <|begin_of_text|>, <|start_header_id|> role markers, <|eot_id|> separators, and a trailing empty assistant header where generation begins. For Mistral, which has no system role, it prepends the system text to the user message inside a single [INST] … [/INST] block. Everything runs locally — no API key, no network call — and you copy the result straight into your runtime.

Tips and notes

  • Match the template to the checkpoint. A Llama 3 template fed to a Llama 2 model (or vice versa) will degrade output — pick the family that matches your weights.
  • Keep the special tokens intact. Don’t hand-edit <|eot_id|> or [/INST]; the model relies on them to delimit turns.
  • Mistral’s system text is a convention. It lives inside the user turn, so keep it concise — there’s no separate slot protecting it.
  • The trailing assistant header is required for Llama 3. Leaving it off means the model has no signal to start replying.
Ad placeholder (rectangle)