This tool extracts the readable text from a PDF and gives it back as plain text you can copy or save. It runs entirely in your browser, so even sensitive documents stay on your machine, and it needs no plugin or account.
How it works
A PDF is a structured binary file. The text you see on the page is drawn by content streams — sequences of drawing commands — and those streams are almost always compressed. Extraction works in a few steps, all client-side:
- Scan the structure. The tool locates every
obj … endobjdefinition and finds the ones that carry stream data. - Inflate. Streams marked
/FlateDecodeare decompressed with the browser-nativeDecompressionStream, which implements the same zlib/DEFLATE algorithm PDFs use. No library is loaded. - Read the text operators. Inside a decompressed content stream, text is shown with the
Tjoperator (a single string) and theTJoperator (an array of string fragments interleaved with spacing numbers). The tool decodes PDF literal strings — handling escapes and octal codes — and hex strings, and uses the large negative spacing values inTJarrays to decide where to insert spaces between words. - Reconstruct lines. Positioning operators such as
Td,TDandT*, plus the end-of-text markerET, are used to insert line breaks so the output reads top to bottom in a sensible order.
The result is tidied — collapsing runs of spaces and excess blank lines — into clean, paste-ready text.
Limitations and notes
- Scanned PDFs have no text. If your PDF is a photo or scan of a page, it stores an image, not characters. This tool cannot read images; that requires OCR, which is a different kind of tool.
- Layout is simplified. Multi-column pages, tables and precise indentation are flattened into linear reading order. The text is accurate, but its arrangement is not a faithful copy of the page.
- Encoding edge cases. Documents using exotic custom font encodings or aggressive font subsetting can yield a few wrong characters; standard documents extract cleanly.
Tips
After extracting, use Download .txt to keep a searchable copy, or Copy text to paste into a document or chat. For PDFs where you want per-page control and layout-preserving options, try the related PDF text extractor tool.