Which standard does this follow?

It follows RFC 9309, the Robots Exclusion Protocol. The most specific matching rule wins, an Allow beats a Disallow of equal length, and the wildcard star and end-anchor dollar sign are supported.

How are groups matched to a user-agent?

Each group has one or more User-agent lines. The tool picks the group whose agent token is the longest substring match of your user-agent, falling back to the star group if no specific name matches.

How is most-specific matching decided?

Among all rules that match the path, the one with the longest pattern wins. If an Allow and a Disallow match with equal length, the Allow takes precedence, exactly as the standard specifies.

Does an empty Disallow block crawling?

No. An empty Disallow value means allow everything for that group. The validator treats it as a no-op so it never disallows a path on its own.

Is my robots.txt sent anywhere?

No. All parsing and URL testing run in your browser with JavaScript. Your file never leaves your device, so staging or internal robots files stay private.

What is the robots.txt Validator?

Parses robots.txt syntax per RFC 9309 and lets you test whether a specific URL would be allowed or disallowed for a given user-agent, applying most-specific-match and wildcard rules. Runs fully in-browser. It runs free in your browser on Gera Tools, with nothing uploaded.

robots.txt Validator

Name: robots.txt Validator
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

A robots.txt file tells crawlers which parts of your site they may fetch. Small mistakes — a misplaced wildcard, an Allow that loses to a more specific Disallow, or a rule before any User-agent — can silently block important pages or expose ones you meant to hide. This robots.txt validator parses your file to the RFC 9309 standard and lets you test any URL against any user-agent in your browser.

How the parser reads your file

The parser splits the file into groups. A group is one or more User-agent lines followed by Allow and Disallow rules; a blank line or a new User-agent after a rule starts a fresh group. Comments after # are stripped, and Sitemap lines are collected and checked for a fully-qualified URL.

When you test a URL, the tool first selects the group whose agent best matches your user-agent (longest substring wins, with * as the fallback). It then evaluates every rule in that group against the path. Patterns support two wildcards: * matches any sequence of characters, and a trailing $ anchors the match to the end of the path.

Most-specific match wins

When several rules match, the one with the longest pattern decides the outcome. If an Allow and a Disallow match with equal pattern length, the Allow wins. This is why Allow: /admin/public/ can override a broader Disallow: /admin/.

Common bugs this validator catches

Rule before any User-agent. A Disallow line that appears before the first User-agent line is invalid. Parsers may silently ignore it or attach it to the wrong group. The validator flags such orphaned rules.

Wildcard on the User-agent line. Writing User-agent: Google* does not match Googlebot — the wildcard on the agent line is not a glob; only * alone (meaning all bots) or an exact bot name is valid per RFC 9309.

Empty Disallow meaning allow. A line Disallow: with no path means “allow all crawling for this group”. It is not an error, but developers sometimes write it by mistake when they mean Disallow: /. The validator shows this as a no-op.

Sitemap without a full URL. A Sitemap: /sitemap.xml relative path is not valid; it must be an absolute URL such as https://example.com/sitemap.xml. The validator checks for the protocol and domain.

Case sensitivity in paths. RFC 9309 specifies that path matching is case-sensitive on most servers. Disallow: /Admin/ does not block /admin/ on a Linux server. The validator applies case-sensitive matching and reports the exact path that was or was not matched.

Example test scenario

Suppose your file contains:

User-agent: *
Disallow: /private/
Allow: /private/public/

Testing the URL /private/data.html against Googlebot should return DISALLOWED (matched by /private/), while /private/public/index.html should return ALLOWED (the more specific Allow wins). The validator shows which rule matched and why.

Notes

The validator models standard behaviour, but real crawlers occasionally differ on edge cases such as percent-encoding and case sensitivity. Use it to catch the common errors and to confirm your intent before publishing. Everything runs locally, so testing a staging or internal robots file is private.

What robots.txt does — and does not — control

A frequent and costly misunderstanding: Disallow controls crawling, not indexing. A page blocked in robots.txt can still appear in search results (typically without a snippet) if other pages link to it, because the crawler is told not to fetch it but nothing stops the URL being listed. To keep a page out of the index you must let it be crawled and serve a noindex meta tag or X-Robots-Tag header — the exact opposite of a Disallow. Two more limits worth remembering:

robots.txt is advisory. Well-behaved crawlers obey it; malicious scrapers ignore it. Never use it to hide sensitive URLs — anyone can read the file at /robots.txt.
The 500 KiB fetch limit. Google only processes the first 500 KiB of a robots.txt file; rules beyond that are ignored. Keep the file small and rule-dense.

Directive support at a glance

Directive	Standardised in RFC 9309	Notes
`User-agent`	Yes	Group selector; `*` is the catch-all.
`Disallow` / `Allow`	Yes	Longest-match wins; `Allow` breaks ties.
`Sitemap`	Yes	Must be an absolute URL.
`Crawl-delay`	No	Honoured by some engines (Bing, Yandex), ignored by Google.
`Host`	No	Non-standard; effectively obsolete.

A pre-publish checklist

Before deploying a robots.txt change to production, run through these five tests in the validator — they cover the mistakes that cause real traffic incidents:

Test your homepage (/) against Googlebot — it should be ALLOWED unless you genuinely intend to block the whole site.
Test one money page (a product, category or landing URL) against each crawler group you define — a broad Disallow: /p style prefix rule can swallow far more than intended, because rules match by prefix, not by path segment.
Test a URL with query parameters if you use wildcard rules like Disallow: /*?sort= — confirm the wildcard matches where you expect and nowhere else.
Confirm the Sitemap line survived the edit and is still an absolute URL.
If you maintain separate rules for AI crawlers (GPTBot, ClaudeBot, PerplexityBot and similar), test the same URL against each — substring group-matching means a typo in the agent token silently drops the crawler into your * group instead.

Sources

IETF — RFC 9309: Robots Exclusion Protocol — the standard this validator implements.
Google Search Central — How Google interprets the robots.txt specification — the 500 KiB limit and crawl-vs-index distinction.