Guide

When to Use Noindex

The noindex directive tells search engines not to include a page in their results. It comes in two forms: a meta tag in the HTML <head>, or an HTTP response header.

<meta name="robots" content="noindex">
X-Robots-Tag: noindex

The HTTP header works for any file type, including PDFs and images, which have no HTML <head>. Both forms do the same thing: once a search engine crawls the page and reads the directive, it removes the page from its index, or never adds it.

The page remains publicly accessible. The URL is not deleted. Anyone who knows the address can still visit it. Noindex only affects whether the page appears in search results.

When noindex is the right call

Thank-you and confirmation pages

A post-purchase or post-submission thank-you page has no independent search value. No one searches for “thank you for your order.” Indexing it creates a page with thin content, no traffic potential, and no purpose beyond a single transactional moment.

Noindex is correct here. The page serves visitors who have already converted; it should not be discoverable through search.

Gated and members-only content

Content behind a login or paywall cannot be served to search engines in full. If you noindex the gated version and maintain an indexable landing page or preview, you control exactly what appears in results. Leaving a gated page crawlable and indexable either presents search engines with a login wall or risks surfacing content you intended to restrict.

Internal search results

A site’s own search results pages (e.g., /search?q=trainers) are typically thin, parameterised, and near-duplicate. Indexing them produces a large volume of low-quality URLs that compete with your actual category and product pages. Noindex keeps crawlers focused on content with real ranking potential.

Faceted navigation and filter URLs

Sites with product filters generate large numbers of URL variants by size, colour, price range, and sort order. Most of these pages are thin, carry no distinct search demand, and duplicate the base category page. Noindex is appropriate for filter combinations that serve no search intent. Some combinations, a specific size and colour of a popular product, for example, may have genuine demand and should be left indexable. Canonical tags are an alternative for consolidating signals rather than removing pages outright.

Staging and development environments

A staging site should never be indexed. It typically contains duplicate content relative to production, may include placeholder copy, and can dilute signals for the live site. Apply a site-wide noindex at CMS level, or combine it with password protection. Robots.txt disallow alone is not sufficient: it blocks crawling but not indexing of any URL Google discovers via links.

Utility pages

Login pages, account dashboards, checkout flows, and cart pages have no search value. They exist for logged-in or transactional users. Noindex keeps them out of results without affecting their function.

The three interactions that cause real damage

Noindex combined with robots.txt disallow

This is the most common and most damaging mistake. It looks like belt-and-braces caution but does the opposite of what is intended.

Noindex only works if Google can crawl the page and read the directive. Robots.txt disallow blocks crawling entirely. When both are applied to the same URL, Google respects the disallow, never visits the page, and therefore never reads the noindex. If the URL has inbound links, Google can still index it as a content-free stub: a URL entry with no title and no snippet.

The rule is simple: to keep a page out of results, allow crawling and use noindex. If you want to reduce crawl load for resource reasons, use robots.txt and accept that the URL may still appear in results as a bare stub.

Noindexed URLs in your sitemap

An XML sitemap is a list of URLs you are asking Google to discover and index. A noindexed URL in that list sends two instructions simultaneously: “please index this” (sitemap) and “do not index this” (noindex directive).

Google resolves the conflict by honouring the noindex. But it still crawls the page repeatedly to check the directive, because the sitemap keeps flagging it as a URL to process. In Google Search Console, these pages appear under “Submitted URL marked noindex” in the Page Indexing report. They are not hurting rankings directly, but they consume crawl budget on URLs that can never be indexed. On large sites with misconfigured sitemaps, this measurably reduces crawl frequency for pages that actually matter.

Fix: remove noindexed URLs from your sitemap. Only include canonical, indexable URLs returning a 200 response.

Noindex on a canonical target

If page A has a canonical tag pointing to page B, and page B carries a noindex directive, neither page will be indexed. The noindex on the canonical target wins. Googlebot reads page A’s canonical, follows it to page B, reads the noindex, and excludes both.

This pattern appears most often after a site reorganises its indexing strategy and noindexes a URL that other pages still point to canonically. Audit with a crawl tool to surface canonical chains that terminate at noindexed pages.

Noindex and nofollow are independent

A common assumption: noindexing a page stops PageRank flowing through links on that page. This is not correct.

Noindex tells Google not to include the page in results. Nofollow tells Google not to follow links on that page. They are separate directives and each must be declared explicitly.

<meta name="robots" content="noindex, nofollow">

For most noindex use cases (thank-you pages, utility pages), adding nofollow is sensible. For pages you noindex for content quality reasons while still wanting internal links followed, such as some faceted navigation scenarios, use noindex alone.

Noindex and AI crawlers

Standard noindex directives using name="robots" apply to all crawlers that respect the robots meta tag. You can also target specific crawlers:

<meta name="googlebot" content="noindex">
<meta name="GPTBot" content="noindex">
<meta name="ClaudeBot" content="noindex">

The picture for AI crawlers is less settled than for search engines. Behaviour varies between crawlers used for search retrieval and those used for training data collection, and not all AI crawlers are consistently compliant. If restricting AI crawlers from specific pages matters, combine meta directives with robots.txt disallow rules for those specific bots.

How to audit for accidental noindex

Accidental noindex is one of the most common findings in a technical SEO audit. It causes complete loss of visibility, which is worse than most ranking problems, and it can persist unnoticed for months.

Google Search Console Page Indexing report. Go to Indexing > Pages and look at the “Excluded by noindex tag” group. Any URL listed there that should be ranking is a priority fix.

CMS settings. WordPress has a site-wide “Discourage search engines” toggle in Settings > Reading. It applies noindex across the entire site and occasionally gets enabled accidentally during development or staging. Shopify, Squarespace, and other platforms have equivalent settings. Check these first on any site that has recently had development work or gone through a platform migration.

Site crawl. Screaming Frog, Sitebulb, and similar tools export a full list of noindexed pages found during a crawl. Cross-reference this against pages that should be indexable.

View source. On any individual page, search the page source for “noindex.” Check both the <head> section (meta tag) and the HTTP response headers (x-robots-tag).

Post-migration check. Staging environments almost always carry a site-wide noindex. The single most common migration mistake is launching without removing it. Rankings can drop before the error surfaces clearly in GSC.

Decision guide

Page typeRecommendation
Thank-you / confirmationNoindex
Gated or members-onlyNoindex
Internal search resultsNoindex
Login / account / checkoutNoindex
Staging environmentNoindex + password protection
Filter / faceted navigationNoindex if no distinct search demand; leave indexable if specific combination has search volume
Paginated pages beyond page 1Noindex if thin; leave indexable if pages contain content with real search demand
Duplicate parameter URLsCanonical to the clean URL; noindex if no canonical target makes sense
Thin or low-quality contentFix the content or consolidate with a redirect. Noindex is not a substitute
Any page you want to rankDo not noindex