Does it actually visit the URLs?

No. Browsers block cross-origin fetches and HEAD requests to arbitrary sites, so the tool does static validation only — URL syntax, a known top-level-domain check, and heuristics for fabricated links. Use the flags to decide which URLs to open and verify manually.

What counts as a hallucination pattern?

Common tells include placeholder hosts (example.com, your-site), implausibly long random document or article IDs, dates far in the future, repeated identical paths, and uncommon or invalid TLDs. None is proof on its own, but a flagged URL deserves a manual look.

Which URL formats are recognized?

It extracts http and https URLs, including those wrapped in markdown link syntax or parentheses, and strips trailing punctuation. Bare domains without a scheme are detected separately and flagged as needing a protocol.

No. Extraction and all checks run locally in your browser. Nothing is sent to a server, so you can validate citations in confidential drafts safely.

What is the LLM Citation URL Validator?

Finds all URLs in LLM output and runs format validation, TLD verification, and pattern checks that flag likely hallucinated links — placeholder hosts, made-up document IDs, and suspicious paths — so you can catch fabricated citations before publishing. It runs free in your browser on Gera Tools, with nothing uploaded.

LLM Citation URL Validator

Name: LLM Citation URL Validator
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

LLM citation URL validator

Language models love to cite sources — and sometimes invent them. A fabricated URL looks completely plausible until someone clicks it and lands on a 404. This tool extracts every link from an LLM’s output and runs a battery of static checks that flag the URLs most likely to be hallucinated, so you can triage your references before they ship.

How it works

The extractor scans for http/https URLs, including those inside markdown [text](url) syntax and parentheses, and trims trailing punctuation. Each URL is then parsed with the browser’s URL constructor for format validity, its top-level domain is checked against a list of common TLDs, and a set of heuristics looks for hallucination tells — placeholder hosts like example.com, suspiciously long random-looking path segments, far-future dates, and duplicated paths across multiple citations. URLs are grouped into OK, malformed, and suspicious, each with a short reason. Because browsers block requests to arbitrary origins, the tool deliberately does not fetch the links — it tells you which ones to open yourself.

Why LLMs hallucinate URLs

Language models generate text by predicting plausible continuations, not by querying a database of real URLs. When a model is asked to cite a source, it synthesises a URL that looks like what a citation for that context should look like — correct-seeming domain, plausible path structure, relevant-sounding document title embedded in the slug. The URL is grammatically and structurally valid but the page it points to does not exist.

The problem is particularly common in:

Academic citations — models hallucinate DOI numbers, journal article paths, and arXiv IDs that follow the real format but point to papers that were never written.
Government and NGO data — models know that government data is typically hosted at .gov, .org, or official agency domains and generate realistic-looking report URLs.
News articles — models compose article slugs that match a newspaper’s URL pattern with a date and headline, but no such article was published.

Hallucination pattern checklist

Beyond the automated checks, these patterns warrant manual verification:

Placeholder language in the URL — Paths containing words like your-study, insert-title, example-report, or placeholder were often copied from a template the model filled in incompletely.

Far-future or impossible publication dates — A URL with /2031/ or a date beyond the model’s training window is a red flag. The model invented a future publication to fill a citation gap.

Identical document structure, different IDs — If three cited URLs differ only in a numeric suffix (/study/1234, /study/5678, /study/9012), the model likely generated a template and varied the ID randomly.

Unknown subdomains on known domains — data.who.int/specific-report-name may look official, but the subdomain and path may not exist. Real WHO data URLs are structured quite specifically.

Exact article titles embedded verbatim in slugs — Real article slugs are usually truncated and dash-joined; a URL containing a 12-word title verbatim is often synthesised.

What to do with flagged URLs

Open the URL in a browser. A 404 confirms the page does not exist. A 200 may confirm it does — though the model could still be citing the right domain but wrong page.
Search for the title instead. If the model cited “Smith et al. 2022, Journal of Applied Research,” search that title directly — the paper may exist at a different URL.
Check the source’s own search. Search site:who.int "report title" or the relevant domain to confirm whether the document exists anywhere on the host.
Remove uncorroborated citations. If you cannot verify a source through two independent methods, remove it rather than publish a fabricated reference.

Tips and notes

Suspicious is not the same as wrong. A long path can be a real permalink. The flags are a triage queue, not a verdict — open the flagged ones first.
Watch for repeated paths. When several citations share an identical trailing path, the model often pasted a template it filled with fake IDs.
Bare domains need a scheme. A reference written as acme.com/report without https:// is flagged so you can normalise it before linking.
Everything is local. No network calls, so confidential drafts stay private.