LLM citation URL validator
Language models love to cite sources — and sometimes invent them. A fabricated URL looks completely plausible until someone clicks it and lands on a 404. This tool extracts every link from an LLM’s output and runs a battery of static checks that flag the URLs most likely to be hallucinated, so you can triage your references before they ship.
How it works
The extractor scans for http/https URLs, including those inside markdown
[text](url) syntax and parentheses, and trims trailing punctuation. Each URL is
then parsed with the browser’s URL constructor for format validity, its
top-level domain is checked against a list of common TLDs, and a set of
heuristics looks for hallucination tells — placeholder hosts like example.com,
suspiciously long random-looking path segments, far-future dates, and duplicated
paths across multiple citations. URLs are grouped into OK, malformed, and
suspicious, each with a short reason. Because browsers block requests to arbitrary
origins, the tool deliberately does not fetch the links — it tells you which ones
to open yourself.
Tips and notes
- Suspicious is not the same as wrong. A long path can be a real permalink. The flags are a triage queue, not a verdict — open the flagged ones first.
- Watch for repeated paths. When several citations share an identical trailing path, the model often pasted a template it filled with fake IDs.
- Bare domains need a scheme. A reference written as
acme.com/reportwithouthttps://is flagged so you can normalise it before linking. - Everything is local. No network calls, so confidential drafts stay private.