AI Content Detector Explainer

Understand how AI text detectors work and why they fail

Ad placeholder (leaderboard)

How AI content detectors actually work

AI text detectors promise a simple answer to a hard question: was this written by a human or a machine? Under the hood, almost all of them lean on the same two statistical signals, and understanding those signals is the fastest way to see why the promise rarely holds up.

Perplexity and burstiness

Perplexity measures how “surprised” a language model is by each next word. If a model can predict the words easily, perplexity is low. Text generated by that same family of models tends to be smooth and predictable — low perplexity — because it was produced by maximizing exactly that predictability. Human writing is messier and harder to predict, so it scores higher.

Burstiness measures variation: humans write a long, winding sentence, then a short one, then a fragment. Machine output is often more uniform. Detectors combine low perplexity with low burstiness and conclude “AI”. The trouble is that plenty of genuine human writing — edited prose, technical documentation, non-native English, formulaic business text — is also smooth and uniform, and gets swept up in the same net.

Why the false-positive rate matters more than accuracy

Vendors love to quote “99% accurate”, but the number that hurts real people is the false-positive rate. Use the calculator above: even a 1% false-positive rate means roughly one wrongly accused person in every batch of 100 human-written documents. Run a course of 500 essays and you are statistically guaranteed to accuse innocent students. Because the cost of a false accusation is so high — academic misconduct, failed assignments, reputational harm — a low-but-nonzero false-positive rate is not “good enough”; it is the whole problem.

Why detectors break in practice

  • No ground truth. Plain GPT or Claude output carries no watermark, so the detector is inferring, never reading a signature.
  • Trivially defeated. Paraphrasing, a “humanize” tool, or light editing raises perplexity and burstiness enough to flip the verdict.
  • Biased. Detectors disproportionately flag non-native English writers, whose regular structures read as low-perplexity.
  • Moving target. Each new model generation writes burstier, more human-like text, eroding whatever accuracy a detector had.
  • Self-reported accuracy. Published numbers come from the vendor’s own test set, not your real-world distribution of documents.

Treat any detector score as a weak, gameable signal. It cannot meet an evidentiary or academic-integrity standard on its own, and decisions that affect someone’s record or livelihood should never rest on it.

Ad placeholder (rectangle)