XML Sitemaps

An XML sitemap is a file that lists the URLs of a site, providing search engines with an explicit catalogue of pages to crawl and (where supported) metadata about each. Sitemaps don’t guarantee indexing, but they accelerate discovery and help large sites manage crawl efficiently.

What a sitemap looks like

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/page-one/</loc>
    <lastmod>2026-04-25</lastmod>
  </url>
  <url>
    <loc>https://example.com/page-two/</loc>
    <lastmod>2026-04-20</lastmod>
  </url>
</urlset>

The required field is <loc> (the URL). Optional fields include <lastmod> (last modification date), <changefreq> (how often the page changes), and <priority> (relative importance).

Google has stated repeatedly that it ignores <changefreq> and <priority>. <lastmod> is used as a hint for crawl scheduling, but only when it is accurate. Sitemaps with bogus or auto-incremented <lastmod> values lose the signal entirely.

When sitemaps matter

Large sites. Sites with thousands or millions of URLs benefit substantially from sitemaps because they help Google prioritise crawl across the inventory.

New sites. Sites with weak external link profiles benefit because sitemaps are an alternative discovery path while inbound links accumulate.

Sites with poor internal linking. Sites where some pages aren’t well-linked internally rely on sitemaps for discovery. Better to fix the internal linking, but sitemaps are a fallback.

Sites with content not easily discovered through standard navigation. Image collections, video archives, news content, and similar collections benefit from dedicated sitemaps (image sitemaps, video sitemaps, news sitemaps).

When sitemaps matter less

Small sites with strong internal linking. A 50-page site where every page is reachable in 2 clicks from the homepage doesn’t materially benefit from a sitemap. The site should still have one (it costs nothing) but the indexing impact is minimal.

Sites where external authority is the constraint. If pages are crawled and not indexed for quality reasons, adding them to a sitemap doesn’t change the underlying problem.

Sitemap best practices

Include only canonical, indexable URLs. Sitemaps should contain URLs you want indexed. Excluding URLs that are noindexed, redirected, or canonicalised elsewhere keeps the signal clean. A messy sitemap is treated as a quality signal in itself.

Keep <lastmod> accurate. Set it to the actual date the content meaningfully changed. Auto-incrementing it on every build (a common framework default) destroys the signal because every URL appears equally and constantly fresh.

Split large sitemaps. A single sitemap can contain up to 50,000 URLs and 50MB uncompressed. Beyond that, use a sitemap index file referencing multiple sitemaps. Even within those limits, splitting by content type or section makes troubleshooting easier.

Submit via Search Console. Add the sitemap URL in Search Console’s Sitemaps section. This is how Google discovers it definitively. Listing it in robots.txt as a Sitemap: directive is also recommended, but Search Console submission is more reliable for status reporting.

Update on publish. Whenever new content is published, the sitemap should reflect it. Static-site generators and CMSs typically handle this automatically.

Sitemap index files

For sites with many sitemaps, a sitemap index file lists them in one place:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap-articles.xml</loc>
    <lastmod>2026-04-25</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-products.xml</loc>
    <lastmod>2026-04-20</lastmod>
  </sitemap>
</sitemapindex>

Submit the index URL in Search Console; Google reads it and follows the references.

Specialised sitemap types

Image sitemaps. Extend standard sitemaps with image-specific metadata (image:loc, image:title, image:caption). Useful for sites where image search is a meaningful traffic source.

Video sitemaps. Extend with video-specific metadata (video:thumbnail_loc, video:duration, video:publication_date). Important for video-heavy sites.

News sitemaps. A separate sitemap format for news publishers with a 48-hour content window. Required for inclusion in Google News surfaces.

Hreflang in sitemaps. Multi-language sites can declare hreflang relationships in sitemaps as an alternative to in-page tags, often easier to maintain at scale.

Common sitemap mistakes

MistakeEffect
Including noindexed URLsConflicting signals; wasted crawl
Including redirect URLsCrawlers follow the redirect; better to include the destination
Auto-incrementing <lastmod> on every buildDestroys the freshness signal
Sitemaps over 50,000 URLsSitemap rejected; split required
Sitemap not referenced in robots.txt or submitted to Search ConsoleDiscovery delayed
Multiple competing sitemap definitionsConfusion; only one should be authoritative

Verifying sitemap health

Search Console > Sitemaps. Shows submission status, discovered URL count, and any parsing errors. Investigate any “couldn’t fetch” or partial-discovery reports.

Search Console > Page Indexing. Shows the gap between sitemap-submitted URLs and indexed URLs. A large gap warrants investigation; a small gap is normal.

Crawler audits (Screaming Frog, Sitebulb). Compare your live URLs to your sitemap. Surface URLs missing from the sitemap and URLs in the sitemap that don’t exist.

Frequently asked questions

Do I need a sitemap if my site is small? Recommended yes, even though the marginal benefit is small. The cost of having one is nearly zero.

Should every URL be in the sitemap? Only canonical, indexable URLs that you actively want in Google’s index. Skip noindexed pages, redirects, paginated archives (in most cases), and admin URLs.

Does Google guarantee to crawl URLs in a sitemap? No. Sitemaps are a discovery aid, not a crawl guarantee. URLs in sitemaps still compete for crawl budget against everything else Google has discovered.