Redacting a PDF properly means the sensitive text must be gone, not just hidden behind a black box that anyone can copy text out from underneath. This tool searches a text-based PDF for a phrase you specify and deletes those characters from the file’s content streams, all inside your browser so the document is never uploaded.
How it works
A text-based PDF stores its visible text as literal strings inside content streams, usually compressed with FlateDecode. The redactor works directly on those streams:
- It parses the raw PDF bytes and locates each
stream … endstreamobject. - For FlateDecode streams it inflates the data using the browser’s native
DecompressionStream("deflate")— no external library is needed. - Inside the decoded text-showing operators (
TjandTJ), it finds your phrase within the parenthesized PDF strings and replaces each matched character with a space, so the glyphs vanish from the text layer. - It re-compresses the stream with
CompressionStream, fixes the stream’s/Length, and reassembles a valid PDF for download.
Because the characters are removed rather than covered, selecting and copying the redacted region in the output yields nothing.
Notes and limits
This approach works on PDFs that contain a real text layer, which is the common case for documents exported from word processors and most digitally generated PDFs. It cannot redact text that is part of a scanned image — those pixels are not searchable text — and it cannot reliably touch PDFs that use heavily subset-encoded fonts storing glyph indices instead of readable characters; the tool tells you when no match was found. Spaces of equal length preserve the layout so nothing shifts. As with any redaction workflow, always re-open the downloaded file and try to copy the redacted area to confirm the sensitive text is truly gone before sharing. Everything runs locally in your browser.