PDF to Plain Text Extractor

Extract readable text from a PDF file — private, no upload required

Ad placeholder (leaderboard)

This tool extracts the readable text from a PDF and gives it back as plain text you can copy or save. It runs entirely in your browser, so even sensitive documents stay on your machine, and it needs no plugin or account.

How it works

A PDF is a structured binary file. The text you see on the page is drawn by content streams — sequences of drawing commands — and those streams are almost always compressed. Extraction works in a few steps, all client-side:

  1. Scan the structure. The tool locates every obj … endobj definition and finds the ones that carry stream data.
  2. Inflate. Streams marked /FlateDecode are decompressed with the browser-native DecompressionStream, which implements the same zlib/DEFLATE algorithm PDFs use. No library is loaded.
  3. Read the text operators. Inside a decompressed content stream, text is shown with the Tj operator (a single string) and the TJ operator (an array of string fragments interleaved with spacing numbers). The tool decodes PDF literal strings — handling escapes and octal codes — and hex strings, and uses the large negative spacing values in TJ arrays to decide where to insert spaces between words.
  4. Reconstruct lines. Positioning operators such as Td, TD and T*, plus the end-of-text marker ET, are used to insert line breaks so the output reads top to bottom in a sensible order.

The result is tidied — collapsing runs of spaces and excess blank lines — into clean, paste-ready text.

Limitations and notes

  • Scanned PDFs have no text. If your PDF is a photo or scan of a page, it stores an image, not characters. This tool cannot read images; that requires OCR, which is a different kind of tool.
  • Layout is simplified. Multi-column pages, tables and precise indentation are flattened into linear reading order. The text is accurate, but its arrangement is not a faithful copy of the page.
  • Encoding edge cases. Documents using exotic custom font encodings or aggressive font subsetting can yield a few wrong characters; standard documents extract cleanly.

Tips

After extracting, use Download .txt to keep a searchable copy, or Copy text to paste into a document or chat. For PDFs where you want per-page control and layout-preserving options, try the related PDF text extractor tool.

Ad placeholder (rectangle)