The PDF Text Extractor opens any PDF in your browser and pulls out every line of
selectable text, ready to copy to your clipboard or download as a clean .txt file.
It is built for the everyday job of getting words out of a PDF — quoting a contract,
moving a report into a document, feeding text to a translator or an AI assistant, or
searching a paper you can only read but not select. Because everything happens locally,
even confidential PDFs stay on your machine: there is no upload, no sign-up and no server
involved at any point.
Unlike a naive “dump the bytes” converter, this tool reconstructs readable, ordered text. PDFs do not store paragraphs — they store thousands of tiny positioned glyph runs, often in a scrambled internal order. The extractor reads each run’s coordinates, groups them into lines, sorts lines top-to-bottom and runs left-to-right, and inserts spaces where there are real gaps. The result reads the way the page looks, including multi-column layouts and headings, rather than a jumble of fragments.
How it works
- Open a PDF. The file is read into memory in your browser. A bundled PDF engine parses the document structure — no network request is made.
- Pick a scope. Extract the whole document or a page expression like
1-3,5,8-. You can also toggle Preserve layout (keep line and column structure), Page markers (insert--- Page N ---separators) and Re-join hyphens (stitch words split across line wraps). - Get your text. The combined text appears in a panel with a live word and character
count, plus a per-page length breakdown that flags any image-only pages. One click copies
everything; another downloads a
.txt. Your last set of options is remembered for next time.
Example
Suppose you have a 12-page invoice PDF and only need the line-item table on pages 4 to 6.
Type 4-6 in the Pages box, leave Preserve layout on so the columns stay aligned, and the
tool returns just those three pages of text. The header shows something like 3 of 12 pages ·
512 words · 3,140 chars. Click Copy all text and paste it straight into a spreadsheet or
email. If page 6 turns out to be a scanned signature image, it is reported as empty in the
per-page breakdown so you know nothing was silently dropped.
For a research paper, turning on Re-join hyphens converts wrapped words such as
micro-\nscope back into microscope, which makes the downloaded text searchable and clean
for citation. Every figure and every character is produced in your browser — nothing is sent
anywhere.