What types of PII does it detect?

It detects email addresses, IPv4 addresses, payment card numbers (Luhn-validated), UK National Insurance numbers, US Social Security numbers, 9-digit passport numbers, and UK/US phone numbers. Each distinct value gets a numbered token.

How are card numbers validated?

Candidate card numbers (13–19 digits) are checked with the Luhn algorithm before being redacted, so random long numbers that fail the checksum are left alone. This cuts down on false positives from order IDs and reference numbers.

Why does the same email get the same token?

Tokens are stable per value: every occurrence of a given email maps to [EMAIL_1], a different email maps to [EMAIL_2], and so on. This keeps the redacted text readable and lets you tell distinct people apart without exposing their data.

Will it catch every piece of personal data?

No. Regex-based detection catches structured identifiers reliably but cannot reliably find free-text names, addresses, or context-dependent data. Always review the output by eye before sharing anything sensitive — treat the tool as a first pass, not a guarantee.

Is my text sent to a server?

No. All pattern matching and redaction run entirely in your browser. Nothing is uploaded, logged, or stored, which is exactly why it is safe to use on confidential documents.

What is the PII Detector & Redactor?

Scan pasted text for personal data — email addresses, UK/US phone numbers, National Insurance numbers, US SSNs, passport numbers, payment cards, and IP addresses — and replace each with a typed token like [EMAIL_1]. All regex runs locally; nothing is transmitted. Built for pre-share scrubbing and GDPR reviews. It runs free in your browser on Gera Tools, with nothing uploaded.

PII Detector & Redactor

Name: PII Detector & Redactor
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Before you paste a log, support transcript, or document into a ticket, a chat, or an AI tool, you should strip out personally identifiable information (PII). This detector scans text locally for the structured identifiers that most often leak — emails, phone numbers, government IDs, and card numbers — and replaces each with a typed, numbered token so the result is safe to share but still readable.

How it works

Each PII type has a dedicated rule that runs entirely in your browser:

Email and IPv4 are matched with standard format patterns.
Payment cards are matched as 13–19 digit runs and then confirmed with the Luhn checksum, so numbers that fail the check digit are ignored.
UK National Insurance numbers use the official prefix rules (excluding invalid prefixes like BG, GB, NK) and an A–D suffix.
US SSNs exclude impossible group/area values such as 000, 666, and 9xx.
Phone numbers are matched broadly and then required to contain at least 10 digits to avoid catching short codes.

Overlapping matches are resolved so a single span is never double-tagged, and each distinct value is assigned a stable token — [EMAIL_1], [PHONE_2], and so on.

Worked example

Consider this support transcript excerpt:

“Hi, my name is Sarah. Please email me at [email protected] or call 07700 900123. My NI number is AB 12 34 56 C.”

After redaction the output becomes:

“Hi, my name is Sarah. Please email me at [EMAIL_1] or call [PHONE_1]. My NI number is [NINO_1].”

The name “Sarah” is left untouched — as noted below, free-text names are not auto-detected because heuristic name matching produces too many false positives. The three structured identifiers are each replaced by a unique, readable token.

What the tool does NOT catch — and why

Certain PII categories are deliberately outside scope:

PII type	Why not auto-detected
Full names	Too many false positives — common words look like names
Street addresses	Addresses are free-form and locale-specific
Dates of birth	Plain dates appear constantly in non-personal context
Bank sort codes / account numbers	Short digit runs match too many reference numbers

The tool is a first-pass scrubber, not a guarantee. Always review the redacted output before sharing anything sensitive — some structured identifiers may also be missed if the surrounding text is unusually formatted.

Common use cases

Support tickets and bug reports: strip customer contact details before pasting transcripts into public issue trackers.
AI prompts: clean logs and documents before feeding them to a language model that may log or train on inputs.
GDPR and data-minimisation reviews: quickly scan exported records to confirm no identifiers leaked into a report column.
Developer testing: take a real-world log and redact it into a safe fixture for your test suite.

Tips and notes

The numbered tokens preserve structure: if a log mentions the same user three times, all three become [EMAIL_1], so the redacted text still makes sense.
Passport detection is intentionally broad (any 9-digit run) and may catch other 9-digit identifiers — review those matches in context.
Because everything is local, this is safe for confidential material. There is no upload, no network request, and no storage of your input.