Meta Robots Tags and Crawl Directives

Last updated 18 July 2026

The meta robots tag and x-robots-tag HTTP header give you per-page control over how search engines handle individual URLs: whether to index them, follow their links, display a text snippet, or show a cached copy. They work at the page level, which is what makes them distinct from robots.txt, which works at the crawl-access level.

What is the meta robots tag?

The meta robots tag sits in the <head> of an HTML document:

<meta name="robots" content="noindex, nofollow">

The name attribute can target a specific crawler (googlebot, bingbot) or all robots (robots). The content attribute is a comma-separated list of directives.

Multiple directives combine: content="noindex, nofollow" tells Google not to index the page and not to follow its links.

Available directives

Index and crawl control

Directive	Effect
`index`	Default. Google may index this page.
`noindex`	Do not include this page in search results.
`follow`	Default. Follow links on this page.
`nofollow`	Do not follow links on this page (does not pass PageRank).
`none`	Equivalent to `noindex, nofollow`.
`all`	Equivalent to `index, follow`. Rarely needed as it is the default.
`indexifembedded`	Allow indexing of this page’s content when it is embedded in another page via an iframe or similar, even if it also carries `noindex` (used together with `noindex`).
`unavailable_after: [date]`	Stop showing the page in results after the specified date and time.

Snippet and display control

Directive	Effect
`nosnippet`	Do not show a text snippet or video preview in results.
`max-snippet: [n]`	Allow a snippet of up to n characters.
`noarchive`	No longer used by Google Search. Google retired the cached link feature in early 2024 and moved this directive to historical reference.¹ Other search engines may still honour it.
`noimageindex`	Do not index images on this page.
`max-image-preview: [setting]`	Control image preview size: `none`, `standard`, or `large`.
`max-video-preview: [n]`	Limit video preview to n seconds.
`notranslate`	Do not offer a translation of this page in results.

nosnippet and max-snippet: 0 now also control whether Google can use a page’s text in AI Overviews and AI Mode: text Google is not permitted to show as a snippet is also withheld from those AI features.¹ To suppress snippet use for one part of a page rather than the whole page, wrap the section in the data-nosnippet HTML attribute (<span data-nosnippet>…</span>), which the same directive family recognises.

The x-robots-tag HTTP header

The x-robots-tag header delivers the same directives as the meta robots tag, but via an HTTP response header rather than HTML. This makes it the only option for file types without an HTML <head>, such as PDFs, images, and other binary files.

Example server configuration (Apache):

Header set X-Robots-Tag "noindex, noarchive"

The x-robots-tag supports all the same directives as the meta robots tag and can also target specific crawlers:

X-Robots-Tag: googlebot: noindex
X-Robots-Tag: bingbot: noindex, nofollow

For HTML pages, either approach works. The HTTP header takes no position in the document hierarchy and can be set programmatically for large groups of URLs.

How this differs from robots.txt

Robots.txt and meta robots are frequently confused because both appear to “hide” pages from search engines. They do different things at different stages of the crawl-index pipeline.

Robots.txt controls whether Googlebot requests a URL at all. A Disallow rule tells the crawler not to visit the URL. It does not prevent indexing: Google can index a disallowed URL if it discovers it via links, though it will have no content to display in the snippet.

Meta robots controls what Google does with a page once it has crawled and read it. Noindex, nosnippet, and the other directives only take effect after Googlebot has successfully downloaded and parsed the page. If Googlebot cannot access the page (because robots.txt blocks it), it cannot read any meta robots instructions.

This creates a practical problem: adding noindex to a page blocked by robots.txt achieves nothing. Googlebot never reads the noindex because it cannot visit the URL.

When to use robots.txt: To reduce crawl load on URLs that do not need to be crawled (URL parameters, internal search results, admin paths). Not as the primary mechanism for excluding pages from search results.

When to use noindex: To exclude specific pages from search results while keeping them crawlable. Thank-you pages, gated content, duplicate versions of content, and staging pages are common candidates.

Common mistakes

Noindex on a disallowed URL. Googlebot cannot read the noindex if it cannot crawl the page. If you want a page excluded from results, allow crawling and use noindex. If you want to block crawling for resource reasons, use robots.txt and accept that the URL may still appear in results as a content-free stub.

Noindexed URLs in the sitemap. An XML sitemap signals that a URL should be discovered and indexed. Including a noindexed URL in the sitemap sends two conflicting instructions. Google honours the noindex, but continues crawling the page on each sitemap pass to re-check the directive. In Google Search Console, these appear under “Submitted URL marked noindex” in the Page Indexing report. They are not a ranking problem, but they consume crawl budget on URLs that can never be indexed. Fix: remove noindexed URLs from the sitemap.

Noindex on a canonical target. If page A carries a canonical tag pointing to page B, and page B has a noindex directive, neither page will be indexed. The noindex wins. Googlebot reads page A’s canonical, follows it to page B, reads the noindex, and excludes both. This is most common after a site reorganises its indexing strategy and noindexes a URL that other pages still reference canonically.

Assuming noindex also stops link equity flowing. Noindex and nofollow are independent directives. A noindexed page can still pass PageRank through its outbound links unless nofollow is also declared. Use content="noindex, nofollow" if you want both effects.

Using noindex for privacy. Noindex is not a security measure. It removes the page from search results, but the URL remains publicly accessible to anyone who knows or guesses it. It is also visible in source code and may be discovered through links. Use authentication or server-level access control for any content that should not be publicly accessible.

Forgetting to remove noindex after launch. Development and staging sites should carry a site-wide noindex directive, typically set at the CMS level. The most common migration mistake is going live without removing it. The entire site drops out of search results, and the cause may not be obvious until the Page Indexing report fills with “URL marked ‘noindex’” entries.

Robots meta tag, data-nosnippet, and X-Robots-Tag — Google Search Central ↩ ↩²

Meta Robots Tags and Crawl Directives

What is the meta robots tag?

Available directives

Index and crawl control

Snippet and display control

The x-robots-tag HTTP header

How this differs from robots.txt

Common mistakes

Guides, Checklists & References

New Website SEO Guide

The SEO Audit Guide

Core Web Vitals Optimisation Guide

SEO Go-Live Checklist

robots.txt Reference

Site Migration Guide

SEO Glossary

SEO News + Updates

EU Tells Google to Share Search Data With Rivals and AI Chatbots

Google's AI Mode Is Now Its Own Second Most-Cited Domain

Study: Wider ChatGPT Search access cuts traditional search by 9.4%

OpenAI Retires ChatGPT Atlas, Folding Agentic Browsing Into ChatGPT

Google revamps Image Search and brings image generation into AI Overviews

What is the meta robots tag?

Available directives

Index and crawl control

Snippet and display control

The x-robots-tag HTTP header

How this differs from robots.txt

Common mistakes

Footnotes

See also

robots.txt and crawlability

PageRank and Link Equity

Guides, Checklists & References

New Website SEO Guide

The SEO Audit Guide

Core Web Vitals Optimisation Guide

SEO Go-Live Checklist

robots.txt Reference

Site Migration Guide

SEO Glossary

SEO News + Updates

EU Tells Google to Share Search Data With Rivals and AI Chatbots

Google's AI Mode Is Now Its Own Second Most-Cited Domain

Study: Wider ChatGPT Search access cuts traditional search by 9.4%

OpenAI Retires ChatGPT Atlas, Folding Agentic Browsing Into ChatGPT

Google revamps Image Search and brings image generation into AI Overviews