Question 1

Do I still need OCR if I am using an LLM?

Accepted Answer

For scanned images and photos, yes — the LLM needs text to work with, and OCR (Tesseract, AWS Textract, Google Document AI) extracts it. For native digital PDFs you can pull the text layer directly. Many vision-capable models now read images, but a dedicated OCR step is usually more accurate and cheaper for high volumes.

Question 2

How do I get structured output instead of a paragraph?

Accepted Answer

Define a JSON schema for the fields you want and instruct the model to return only valid JSON matching it, using function calling or a structured-output mode where available. Then validate the result against the schema in code and reject or retry malformed responses. Never parse free-form prose for production data.

Question 3

How accurate is LLM document extraction?

Accepted Answer

On clean, common document types it is very good, but accuracy drops on poor scans, unusual layouts, and edge cases. Production systems add a confidence step and route low-confidence extractions to human review. Treat the model as a fast first pass that handles the bulk, not a fully autonomous system you never check.

Question 4

How do I let users ask questions about a document?

Accepted Answer

Use retrieval-augmented generation — split the document into chunks, embed them, store the vectors, and at query time retrieve the most relevant chunks to feed into the model alongside the question. This grounds answers in the actual document and lets you cite the source passage, reducing hallucination.

Question 5

What is the hardest part of building one of these in practice?

Accepted Answer

Not the AI — it is the messy long tail of real documents. Rotated scans, multi-column layouts, tables, mixed languages, and handwriting break naive pipelines. Budget most of your effort on robust ingestion, validation, and a review loop, and far less on the model call itself, which is the easy part.

How to Build a Document Intelligence App

Stage 1 — Ingest and OCR

Stage 2 — Classify and extract with an LLM

Stage 3 — Query with retrieval

Stage 4 — The review interface

Putting it together