Question 1

What is system prompt leakage?

Accepted Answer

System prompt leakage is when an attacker tricks an AI application into revealing the hidden instructions that govern its behaviour. These instructions, called the system prompt, are normally invisible to users but can often be coaxed out with carefully crafted inputs.

Question 2

Why is a leaked system prompt a security problem?

Accepted Answer

A leaked prompt can expose business logic, secret guardrails, internal tool names, and sometimes credentials or API details embedded in the instructions. Attackers use this knowledge to bypass safeguards more reliably or to clone the product's behaviour.

Question 3

How do attackers extract a system prompt?

Accepted Answer

Common tactics include asking the model to repeat everything above the conversation, framing the request as a translation or summarisation task, or using roleplay to convince the model that revealing its instructions is allowed. These are forms of prompt injection.

Question 4

Can system prompt leakage be fully prevented?

Accepted Answer

No technique fully prevents it, because the system prompt is in the same context the model reasons over. The practical goal is to minimise the damage by keeping secrets out of the prompt entirely and treating any leaked text as non-confidential.

What Is System Prompt Leakage? An AI Security Guide

What system prompt leakage is

How extraction attacks work

Why a leaked prompt matters

Defensive strategies

Why it matters