Can it tell if a link is hallucinated?

Not with certainty — it checks format, not reachability. It flags strong tells like uncommon top-level domains, example.com placeholders, loopback hosts, non-ASCII homograph characters, and malformed URLs, but a well-formed link can still point to a page that does not exist.

Why does it flag uncommon TLDs?

LLMs sometimes invent plausible-looking domains with TLDs that do not exist or are rare. Flagging anything outside a common-TLD list surfaces those for a closer look, while real but unusual TLDs are flagged only as a soft warning to review.

Does it find bare domains without http?

Yes. It matches both full http and https URLs and bare www. domains, then normalizes and deduplicates them. Trailing punctuation that is not part of the URL is trimmed automatically.

Does my text leave the browser?

No. All extraction and validation runs locally in JavaScript with no network requests, so your text and links stay on your device. It is safe for confidential content.

What is the URL & Link Extractor from LLM Output?

Finds every URL in LLM output, including hallucinated ones, deduplicates them, validates URL format, and flags likely-hallucinated links — uncommon TLDs, placeholder domains, homograph hosts, and malformed addresses. Runs in your browser. It runs free in your browser on Gera Tools, with nothing uploaded.

URL & Link Extractor from LLM Output

Name: URL & Link Extractor from LLM Output
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Get one useful tool a week

Like this tool? Enter your email and we'll send you one genuinely useful Gera tool a week — plus a link to come back to this one. No spam, one-click unsubscribe any time.

Catch broken and hallucinated links in AI output

Language models confidently cite URLs that look real but do not exist. This tool pulls every link out of a block of LLM output, removes duplicates, checks each one for valid format, and flags the patterns that usually signal a hallucinated or unusable link — so you can verify before you publish or click.

LLM hallucinated links follow recognizable patterns. Models tend to invent:

Plausible but non-existent journal articles — a DOI or URL that resolves to a 404 because the paper doesn’t exist.
Wikipedia pages that were never written — a valid-looking Wikipedia path for a topic that has no article.
Invented API endpoints — a real base domain but a path the provider never created, common in code-generation and documentation tasks.
Placeholder domains — example.com/something or yoursite.com/page copied from training data.

How it works

A regular expression scans the text for both http/https URLs and bare www. domains, trims trailing punctuation, and deduplicates the results. Each link is then parsed with the browser’s URL API and checked against several heuristics: uncommon or missing top-level domains, example.* placeholder domains, localhost and loopback hosts, non-ASCII characters that can hide homograph attacks, unusually long hostnames, and opaque path segments LLMs tend to invent. Valid, unremarkable links are marked clean; anything questionable is flagged with the reason. Everything runs client-side.

What the flags mean

Flag	What it means
Uncommon TLD	The top-level domain is outside the common list — may be real but worth checking
Placeholder domain	Matches `example.*` or similar template domains from training data
Loopback/localhost	Points to a local address — never a valid public citation
Non-ASCII host	Contains characters that can disguise homograph phishing domains
Malformed URL	The browser URL parser rejected it — the link is broken regardless

Tips and notes

A green check means the URL is well-formed, not that the page exists — the tool cannot fetch links, so always open flagged ones (and ideally the clean ones too) before trusting a model’s citations. The uncommon-TLD flag is intentionally conservative and will sometimes flag legitimate but rare domains; treat it as a prompt to look closer. Use the copy button to pull a clean, deduplicated list of links into a checker or your notes. Because nothing leaves your browser, it is safe to run on confidential drafts.

Checking links in bulk

After extraction, the deduplicated list of clean URLs is ready to copy into an external link checker (Ahrefs Broken Link Checker, Screaming Frog, or a simple curl loop) to verify reachability. The extractor handles the tedious step of finding and deduplicating the URLs; reachability checking requires a network request the browser tool deliberately avoids for privacy and security reasons.