Meta Robots Tags and Crawl Directives
Last updated
The meta robots tag and x-robots-tag HTTP header give you per-page control over how search engines handle individual URLs: whether to index them, follow their links, display a text snippet, or show a cached copy. They work at the page level, which is what makes them distinct from robots.txt, which works at the crawl-access level.
What is the meta robots tag?
The meta robots tag sits in the <head> of an HTML document:
<meta name="robots" content="noindex, nofollow">
The name attribute can target a specific crawler (googlebot, bingbot) or all robots (robots). The content attribute is a comma-separated list of directives.
Multiple directives combine: content="noindex, nofollow" tells Google not to index the page and not to follow its links.
Available directives
Index and crawl control
| Directive | Effect |
|---|---|
index | Default. Google may index this page. |
noindex | Do not include this page in search results. |
follow | Default. Follow links on this page. |
nofollow | Do not follow links on this page (does not pass PageRank). |
none | Equivalent to noindex, nofollow. |
all | Equivalent to index, follow. Rarely needed as it is the default. |
Snippet and display control
| Directive | Effect |
|---|---|
nosnippet | Do not show a text snippet or video preview in results. |
max-snippet: [n] | Allow a snippet of up to n characters. |
noarchive | Deprecated by Google (October 2024). Google removed cached pages from Search in 2024 and moved this directive to historical reference.1 Other search engines may still honour it. |
noimageindex | Do not index images on this page. |
max-image-preview: [setting] | Control image preview size: none, standard, or large. |
max-video-preview: [n] | Limit video preview to n seconds. |
notranslate | Do not offer a translation of this page in results. |
The x-robots-tag HTTP header
The x-robots-tag header delivers the same directives as the meta robots tag, but via an HTTP response header rather than HTML. This makes it the only option for file types without an HTML <head>, such as PDFs, images, and other binary files.
Example server configuration (Apache):
Header set X-Robots-Tag "noindex, noarchive"
The x-robots-tag supports all the same directives as the meta robots tag and can also target specific crawlers:
X-Robots-Tag: googlebot: noindex
X-Robots-Tag: bingbot: noindex, nofollow
For HTML pages, either approach works. The HTTP header takes no position in the document hierarchy and can be set programmatically for large groups of URLs.
How this differs from robots.txt
Robots.txt and meta robots are frequently confused because both appear to “hide” pages from search engines. They do different things at different stages of the crawl-index pipeline.
Robots.txt controls whether Googlebot requests a URL at all. A Disallow rule tells the crawler not to visit the URL. It does not prevent indexing: Google can index a disallowed URL if it discovers it via links, though it will have no content to display in the snippet.
Meta robots controls what Google does with a page once it has crawled and read it. Noindex, nosnippet, and the other directives only take effect after Googlebot has successfully downloaded and parsed the page. If Googlebot cannot access the page (because robots.txt blocks it), it cannot read any meta robots instructions.
This creates a practical problem: adding noindex to a page blocked by robots.txt achieves nothing. Googlebot never reads the noindex because it cannot visit the URL.
When to use robots.txt: To reduce crawl load on URLs that do not need to be crawled (URL parameters, internal search results, admin paths). Not as the primary mechanism for excluding pages from search results.
When to use noindex: To exclude specific pages from search results while keeping them crawlable. Thank-you pages, gated content, duplicate versions of content, and staging pages are common candidates.
Common mistakes
Noindex on a disallowed URL. Googlebot cannot read the noindex if it cannot crawl the page. If you want a page excluded from results, allow crawling and use noindex. If you want to block crawling for resource reasons, use robots.txt and accept that the URL may still appear in results as a content-free stub.
Noindexed URLs in the sitemap. An XML sitemap signals that a URL should be discovered and indexed. Including a noindexed URL in the sitemap sends two conflicting instructions. Google honours the noindex, but continues crawling the page on each sitemap pass to re-check the directive. In Google Search Console, these appear under “Submitted URL marked noindex” in the Page Indexing report. They are not a ranking problem, but they consume crawl budget on URLs that can never be indexed. Fix: remove noindexed URLs from the sitemap.
Noindex on a canonical target. If page A carries a canonical tag pointing to page B, and page B has a noindex directive, neither page will be indexed. The noindex wins. Googlebot reads page A’s canonical, follows it to page B, reads the noindex, and excludes both. This is most common after a site reorganises its indexing strategy and noindexes a URL that other pages still reference canonically.
Assuming noindex also stops link equity flowing. Noindex and nofollow are independent directives. A noindexed page can still pass PageRank through its outbound links unless nofollow is also declared. Use content="noindex, nofollow" if you want both effects.
Using noindex for privacy. Noindex is not a security measure. It removes the page from search results, but the URL remains publicly accessible to anyone who knows or guesses it. It is also visible in source code and may be discovered through links. Use authentication or server-level access control for any content that should not be publicly accessible.
Forgetting to remove noindex after launch. Development and staging sites should carry a site-wide noindex directive, typically set at the CMS level. The most common migration mistake is going live without removing it. The entire site drops out of search results, and the cause may not be obvious until the Page Indexing report fills with “Excluded by noindex tag” entries.