PDF Text Extractor

Pull all the text out of a PDF and copy or download it as a .txt file.

Ad placeholder (leaderboard)
Enjoying the tools? Go Pro for £4.99 (one-time) and remove all ads — forever, on this device. Remove ads — £4.99

The PDF Text Extractor opens any PDF in your browser and pulls out every line of selectable text, ready to copy to your clipboard or download as a clean .txt file. It is built for the everyday job of getting words out of a PDF — quoting a contract, moving a report into a document, feeding text to a translator or an AI assistant, or searching a paper you can only read but not select. Because everything happens locally, even confidential PDFs stay on your machine: there is no upload, no sign-up and no server involved at any point.

Unlike a naive “dump the bytes” converter, this tool reconstructs readable, ordered text. PDFs do not store paragraphs — they store thousands of tiny positioned glyph runs, often in a scrambled internal order. The extractor reads each run’s coordinates, groups them into lines, sorts lines top-to-bottom and runs left-to-right, and inserts spaces where there are real gaps. The result reads the way the page looks, including multi-column layouts and headings, rather than a jumble of fragments.

How it works

  1. Open a PDF. The file is read into memory in your browser. A bundled PDF engine parses the document structure — no network request is made.
  2. Pick a scope. Extract the whole document or a page expression like 1-3,5,8-. You can also toggle Preserve layout (keep line and column structure), Page markers (insert --- Page N --- separators) and Re-join hyphens (stitch words split across line wraps).
  3. Get your text. The combined text appears in a panel with a live word and character count, plus a per-page length breakdown that flags any image-only pages. One click copies everything; another downloads a .txt. Your last set of options is remembered for next time.

Example

Suppose you have a 12-page invoice PDF and only need the line-item table on pages 4 to 6. Type 4-6 in the Pages box, leave Preserve layout on so the columns stay aligned, and the tool returns just those three pages of text. The header shows something like 3 of 12 pages · 512 words · 3,140 chars. Click Copy all text and paste it straight into a spreadsheet or email. If page 6 turns out to be a scanned signature image, it is reported as empty in the per-page breakdown so you know nothing was silently dropped.

For a research paper, turning on Re-join hyphens converts wrapped words such as micro-\nscope back into microscope, which makes the downloaded text searchable and clean for citation. Every figure and every character is produced in your browser — nothing is sent anywhere.

Ad placeholder (rectangle)