Get the words out of a Word file — no Word required
Sometimes you just need the text from a .docx and do not want to open a heavyweight editor or upload a confidential file to an online converter. Because a .docx is secretly a ZIP of XML files, a browser can open it directly. This tool does exactly that and hands you clean plain text.
How it works
The Office Open XML (OOXML) format packages a Word document as a ZIP archive. The main body text lives in word/document.xml, where the structure is straightforward:
<w:p>marks a paragraph.<w:t>holds a run of text.<w:tab>is a tab character and<w:br>/<w:cr>are line breaks.
The extractor unzips the archive with JSZip, parses document.xml with DOMParser, then walks each paragraph’s nodes in document order, emitting text runs, tabs, and breaks as it goes. Joining the paragraphs reproduces the readable text faithfully, even when Word has split a single sentence across several runs for formatting reasons.
Tips and notes
- Structure is kept, styling is not. Paragraphs, tabs, and manual breaks survive; bold, colour, and fonts are intentionally dropped for clean text.
.docxonly. If you have a legacy.doc, open it in Word or an office suite and save as.docxfirst.- Main body coverage. Content in headers, footers, footnotes, and text boxes lives in separate parts; this tool focuses on the main body, which is what most extractions need.
- Everything runs locally, so even sensitive documents stay on your machine.