Why preserve evidence before investigating?

AI logs are frequently subject to short retention windows and rolling caches that overwrite themselves within hours. If you investigate first, the most diagnostic evidence — the exact prompt, the model version, the raw output — may already be gone. Preserve first, analyse second.

What evidence is unique to AI incidents?

Beyond ordinary application logs you need the exact prompt and system prompt, the model name and version, sampling parameters like temperature, any retrieval context fed in, and the raw model output before post-processing. These determine whether the behaviour is reproducible.

Does this guarantee admissibility in a legal proceeding?

No. This checklist helps you preserve the technical artifacts and maintain a basic chain of custody, but formal admissibility depends on jurisdiction and process. For litigation-grade preservation, involve legal counsel and a qualified forensic specialist early.

How fast do I need to act?

Treat the first hour as critical. Provider-side logs, ephemeral caches, and in-memory session state can disappear quickly. Issue a litigation hold or retention freeze to your provider and internal teams as the very first action.

Is my progress saved?

Yes, your ticks are stored locally in your browser so a refresh does not lose them, and nothing is sent to a server. The checklist itself is informational and is not a substitute for professional incident-response advice.

What is the AI Incident Evidence Preservation Checklist?

Work through a forensic evidence preservation checklist after an AI safety incident — covering prompt logs, model version records, output caches, user session data, and system configuration snapshots for investigation. It runs free in your browser on Gera Tools, with nothing uploaded.

AI Incident Evidence Preservation Checklist

Name: AI Incident Evidence Preservation Checklist
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

AI incident evidence preservation

When an AI system causes a safety or privacy incident — a harmful output, a data leak, a hallucinated decision with real consequences — the most valuable evidence is also the most perishable. Prompt logs roll over, model versions get silently updated, caches expire, and session state evaporates. This checklist walks you through preserving the right artifacts first, before anyone starts poking at the system and accidentally destroys the very record an investigation needs.

How it works

You tell the tool the incident type — harmful output, data exposure, an erroneous automated decision, a security breach, or a model behaviour change — and which systems were involved. It then filters a master list of evidence sources down to the ones that matter and orders them by how quickly they decay. Volatile sources like in-memory session data and provider-side rolling logs come first; durable sources like database snapshots come later. Each item carries a note on what exactly to capture and why. As you secure each source you tick it off, and you can export the whole record with the time you completed it for your chain-of-custody file.

Why the order matters: evidence decay rates

Not all AI incident evidence decays at the same speed. Missing the fastest-decaying items is the most common mistake in the first hour of an incident response.

Minutes to hours (preserve immediately)

In-memory session state and ephemeral caches — gone on restart
Provider-side request logs — often rolling, 24–72 hour windows
Browser session data and local storage — lost on tab close or browser clear

Hours to days (preserve within first working day)

Application-level logs — typically 7–30 day retention depending on your configuration
API request/response logs in your infrastructure — often compressed or deleted on a rolling window
User-facing session recordings if your application records them

Days to weeks (still perishable, but more time)

Database records of the user session and any downstream actions
Cloud storage of inputs and outputs if your application archives them
Third-party monitoring tool data (error trackers, observability platforms)

Durable (preserve but not urgent)

Model version pinned at the time of the incident (document the exact version string, which may change silently if you rely on an alias like “gpt-4”)
System prompt and configuration files (commit hash or export)
Infrastructure configuration and deployment records

The artifacts unique to AI incidents

Classic incident response focuses on network logs, access records, and file system changes. AI incidents add a set of artifacts that classic playbooks miss:

The exact prompt — including the system prompt, any retrieval context injected, the conversation history, and all parameters (temperature, top-p, max tokens). The same user message produces different outputs with different prompts.
The raw model output before post-processing — your application may filter, truncate, or transform the model’s response. The unmodified response is the true evidence.
The model version string — not the product name, the exact version. Models behind API aliases are updated without notice; the version at the time of the incident is the reproducibility anchor.
Sampling parameters — temperature and top-p affect both the output and reproducibility. A temperature-0 incident is deterministic; a high-temperature incident may be difficult to reproduce exactly.

Tips and notes

Freeze retention immediately. Your very first action should be a retention hold to your AI provider and internal logging teams so nothing rotates out.
Capture the model version, not just the model. “GPT-4” is not enough — the exact dated version and its sampling parameters decide reproducibility.
Preserve raw output before post-processing. Your application probably transforms model output; the unmodified response is the real evidence.
Record who touched what, when. A simple timestamped action log turns a pile of files into a defensible chain of custody.
Do not reproduce the incident on the live system. Reproduce in an isolated environment after evidence is secured; poking at the live system can destroy ephemeral evidence and expand the incident scope.