A robots.txt file controls which crawlers may visit which paths, and in 2026 it is also where you state your stance on AI training crawlers. It is easy to misconfigure: forgetting AI bots entirely, accidentally allowing admin paths, or — most importantly — assuming the file hides anything when it is fully public. This auditor parses your robots.txt in the browser, groups the rules, and reports privacy and syntax findings with fixes.
How it works
The tool parses the file line by line, building groups keyed by User-agent:
- AI crawler coverage — it checks whether any rules address known AI bots (
GPTBot,ClaudeBot,CCBot,Google-Extended,PerplexityBot). If none are mentioned, it flags that your AI-training stance is undefined. - Path exposure —
Disallowrules naming sensitive paths (/admin,/api,/private,/.git) are noted, because listing them publicly is a disclosure, not protection. - Permissive Allow —
Allowrules that open up admin or API paths are flagged. - Syntax — directives appearing before any
User-agentline, emptyAllow/Disallowvalues, and unknown fields are reported with line numbers.
Each finding comes with a short remediation note.
Tips and example
Consider:
User-agent: *
Disallow: /admin/
Allow: /api/public/
The auditor notes that /admin/ is exposed by being named, that no AI crawler directive exists, and confirms the Allow is scoped to a public sub-path. The privacy-correct approach is to protect /admin with real authentication (not robots.txt), add explicit User-agent: GPTBot rules if you want to opt out of AI training, and never rely on Disallow to keep anything secret — the file itself is world-readable.