Why specify a type for each field?

Types tell the model how to coerce what it finds — a date into ISO format, a price into a number without a currency symbol, a yes/no into a boolean. Without types you get inconsistent strings that are painful to parse downstream.

What should missing fields return?

That is the null-handling rule. The two safe choices are an explicit null (best for JSON, keeps the schema stable) or an empty string (often easier for CSV). What you must avoid is letting the model guess or fabricate a value, which the prompt explicitly forbids.

Will the output be valid JSON?

The prompt instructs the model to return only the JSON object with no commentary or code fences, which is what makes it parseable. With current models this is reliable for well-defined schemas, but always wrap your parse in a try/catch since no model is perfect.

Can it extract lists, like all the people named?

Yes. Choose the list type for a field and the prompt asks the model to return an array of values for it, while scalar fields stay single values. This handles the common case of one document mentioning several entities of the same kind.

Data Extraction Prompt Builder

Data extraction prompt builder

Pulling structured data out of messy text — invoices, emails, resumes, support tickets — is one of the things LLMs do best, but only if you tell them exactly what you want. Ask vaguely and you get prose; ask for a precise schema with typed fields and null rules and you get clean JSON you can drop into a database. This builder lets you define each field, its type, and how to handle missing values, then writes the extraction prompt and a matching schema for you.

How it works

You add fields one at a time, each with a type — string, number, date, boolean, email, or list. The tool builds a schema from those fields and writes a prompt that instructs the model to extract them from the text you paste, coerce each value to its type (ISO dates, bare numbers, true/false booleans), and apply your null-handling rule when a field is absent. It explicitly forbids guessing or fabricating values. Finally you pick the output format: a JSON object (or array of objects) or CSV rows, with an instruction to return only the data and no surrounding commentary so it parses cleanly.

Tips and notes

Be specific in field names. “invoice_total” extracts better than “amount” when a document has several numbers. Clear names guide the model to the right value.
Prefer null over empty for JSON. Explicit null keeps every record the same shape, which makes downstream parsing and validation far simpler.
Always parse defensively. Even with a strict prompt, wrap your JSON parse in a try/catch — a single malformed response should not crash your pipeline.
Use the list type for repeats. When one document mentions many of the same thing (all attendees, every line item), the list type returns an array instead of forcing one value.