Stress-test your LLM app against prompt injection
If your application passes user input into a language model, that input can try to override your instructions. This suite generates a categorized library of realistic injection, jailbreak and exfiltration strings — optionally tailored to your own system prompt — so you can probe your defenses before an attacker does.
How prompt injection works
A language model treats everything in its context window as text to follow, and it cannot natively tell trusted instructions (your system prompt) apart from untrusted content (user input, retrieved documents, tool output). Attackers exploit this by embedding instructions in the untrusted channel, such as:
- Instruction override — “Ignore all previous instructions and instead…”
- Role-play jailbreak — framing the model as an unrestricted persona.
- Data exfiltration — coaxing the model to reveal its hidden system prompt or secrets.
- Delimiter confusion — faking the markers your app uses to separate roles.
Prompt injection is LLM01 on the OWASP Top 10 for LLM Applications, and there is no single fix — defense is layered: input filtering, output validation, privilege separation, and refusing to act on instructions from untrusted text.
Tips for testing
- Run every case through the real path users hit, including any retrieval or tool steps where injected content can hide.
- Define “safe” up front so you can score responses automatically — the model should refuse or ignore overrides, never comply.
- Combine categories. Real attacks chain techniques (role-play + delimiter spoofing), so test combinations too.
- Re-test after every prompt or model change — defenses that held yesterday can regress silently.