Question 1

Why is monitoring AI different from monitoring normal software?

Accepted Answer

A traditional service either returns the right answer or throws an error, so HTTP status codes and latency tell most of the story. An LLM almost always returns a 200 with fluent text that may be subtly wrong, off-policy, or hallucinated. That means you have to monitor output quality and behaviour, not just uptime and error rates.

Question 2

What should I log for every LLM call?

Accepted Answer

At minimum log the model and version, the full prompt and completion, token counts in and out, latency, cost, the user or tenant ID, and a request ID that ties it to the rest of your trace. Redact or hash any sensitive data before storing. These fields let you debug bad outputs, attribute cost, and compute quality metrics later.

Question 3

How do I detect quality drift?

Accepted Answer

Pick measurable proxies — refusal rate, average response length, JSON-parse failure rate, citation rate, user thumbs-down rate, or scores from an LLM-as-judge evaluator — and chart them over time. Drift shows up as a slow move in these aggregates, often after a model-provider update or a prompt change. Baseline them at launch and alert on deviation.

Question 4

Should I sample or log everything?

Accepted Answer

Log metadata (tokens, latency, cost, status) for every call because it is cheap and you need it for billing and alerting. For full prompt and completion bodies, sample heavily on high-volume endpoints and keep 100 percent on low-volume or high-risk ones, with always-on capture for any flagged or errored request.

Question 5

What alerts actually matter?

Accepted Answer

Alert on hard failures (timeouts, provider 5xx, empty completions), cost anomalies (spend per hour above a threshold), latency p95 regressions, and quality-proxy breaches like a sudden jump in JSON-parse failures or refusals. Keep the page-worthy set small so the team trusts it.

How to Monitor AI Apps in Production

Why AI apps need their own monitoring

What to log on every call

Tracking quality and drift

Alerting without the noise