A sitemap URL counter built for SEOs, developers and site auditors who need to quickly understand the scale and metadata quality of any XML sitemap. Paste the raw XML and within milliseconds you have a total URL count, duplicate detection, metadata coverage statistics, a changefreq distribution, a priority breakdown and a path-depth analysis — all computed locally in your browser without uploading a single byte.
Why URL count matters
Search engines impose informal and formal crawl budgets on every site. Knowing exactly how many URLs are in your sitemap — and whether they carry complete metadata — directly affects how efficiently Googlebot, Bingbot and others discover your content. A sitemap with 10,000 entries but only 30% having lastmod values gives crawlers far weaker freshness signals than one where every URL carries an accurate date. The tool surfaces these gaps at a glance.
How it works
The tool uses the browser’s built-in DOMParser to parse the pasted XML against the application/xml MIME type, which gives native, spec-compliant XML parsing with no external dependencies. It then queries all <url> elements (for <urlset> sitemaps) or all <sitemap> elements (for <sitemapindex> files) and reads the child elements <loc>, <lastmod>, <priority> and <changefreq> from each.
Duplicate detection is performed by comparing the full set of <loc> strings against a Set — the difference between the total count and the set size is the duplicate count. Hostname extraction uses the browser’s URL constructor, so even non-standard ports and subdomains are parsed correctly. Path depth is calculated by splitting pathname on / and counting non-empty segments.
Worked example
A sitemap for a medium-sized blog might look like this after parsing:
| Metric | Value |
|---|---|
| Total URLs | 847 |
| Unique URLs | 847 |
| Duplicates | 0 |
| Have lastmod | 612 (72%) |
| Have priority | 847 (100%) |
| Have changefreq | 209 (25%) |
The changefreq breakdown might reveal that 180 of the 209 entries use monthly, 20 use weekly (the homepage and category pages) and 9 use yearly (legal pages). The path-depth table shows 1 root URL at depth 0, 6 category URLs at depth 1 and 840 posts at depth 2 — a clean, shallow architecture that crawlers handle efficiently. The 75% lastmod coverage is a clear action item: the 235 posts with no lastmod should be updated to include the date they were last substantively edited.
Formula note
There is no arithmetic formula involved — the count is a direct DOM node count: document.querySelectorAll('url').length. The percentage figures for metadata coverage use (count_with_field / total_urls) * 100, rounded to one decimal place for display. Path depth uses pathname.split('/').filter(Boolean).length, which correctly handles trailing slashes by filtering empty segments.
Sitemap index vs urlset
The Sitemaps protocol allows two root elements. A <urlset> file is the standard format: it contains <url> entries, each with a <loc> and optional metadata. A <sitemapindex> file is a parent document that lists child sitemap files using <sitemap><loc> entries. Large sites (typically those with more than 50,000 URLs) split their URLs across multiple <urlset> files and reference all of them from a single index. The tool detects the root element automatically and switches between the two counting modes.