NLP De-identification Tester

Test whether your de-identified text can be re-identified

Ad placeholder (leaderboard)

NLP de-identification tester

Stripping names out of text feels like anonymisation, but it usually is not. Quasi-identifiers — a date of birth, a postcode, an unusual job title, a rare medical condition — can combine to pinpoint a single individual even when every obvious identifier is gone. This tester scans de-identified text for the signals that make re-identification possible, so you can find and fix the weak spots before you share or publish.

How it works

The tool runs entirely client-side. It scans your text for categories of residual risk: dates and ages, locations and postcodes, job titles and organisations, contact patterns that slipped through, and rare/specific phrases. It counts how many distinct quasi-identifier categories appear together, because the danger is in the combination — three or four quasi-identifiers in one record is often enough to single someone out. The output is a risk level, the categories found, and the specific spans that triggered each one.

Tips and notes

A low score means fewer obvious signals, not proof of anonymity — true anonymisation depends on the whole dataset and the realistic means an adversary could use, which a single snippet cannot capture. To reduce risk, generalise (age bands instead of exact ages), bucket (region instead of postcode), suppress rare values, and remove distinctive event details. Re-test after each change and keep a record of your reasoning, because under GDPR you must be able to justify why you treat the data as anonymised or merely pseudonymised.

Ad placeholder (rectangle)