Indexing
Last updated
Indexing is the process by which Google adds a crawled page to its search database, making it eligible to appear in results. Crawling and indexing are separate steps, and the gap between them is one of the most misunderstood areas of technical SEO.
A page can be crawled consistently and never indexed. The crawl is access. Indexing is an editorial decision.
See How Google crawls, renders, and indexes pages for a full description of the pipeline that precedes this decision.
What Google evaluates at the indexing stage
After crawling and rendering a page, Google assesses whether it belongs in the index. Several factors influence that decision.
Content uniqueness and standalone value
Google’s index contains hundreds of billions of pages. A new page needs to offer something the index does not already have, or to offer it better. Pages that closely mirror existing indexed content, add no distinct perspective, or answer a query no differently from a dozen already-indexed pages are candidates for exclusion.
This is not about originality in the creative sense. A factual page covering a well-worn topic can still earn indexing if it is accurate, well-structured, and serves the query better than what exists. The question Google is effectively asking: does this page help a user in a way that justifies its place in the index?
Thin content
Thin content is content with insufficient unique substance relative to what already exists in the index. It is not purely a word-count issue. A 2,000-word page built from padding is thin. A 400-word page that directly answers a specific question may not be.
Common sources of thin content: template-generated pages with minimal variation, category or tag pages with no editorial content, product pages with only manufacturer descriptions, and location pages using the same copy for every city.
noindex directive
A <meta name="robots" content="noindex"> tag or an X-Robots-Tag: noindex HTTP header explicitly instructs Google not to index the page. The page must be crawlable for Google to read this directive. A page blocked in robots.txt cannot be noindexed this way. Google will not see the tag if it cannot crawl the page.
Canonical signals
If a page declares a canonical tag pointing to a different URL, Google consolidates signals on that URL and typically does not index the source page independently. See canonical tags for how this works and the failure patterns to avoid.
HTTP status codes
Google only indexes pages that return a 200 status code. Pages returning 404 or 410 are not indexed. Pages returning 503 are treated as temporarily unavailable and retried. A page returning 200 with empty or error content (a soft 404) may also be excluded.
Authority as a soft indexing input
Google’s indexing is not uniform across all sites. Pages on sites with stronger authority signals tend to be indexed faster and more reliably. A new page on a high-authority domain may appear in the index within hours of being crawled. The same page on a low-authority domain may wait days or weeks, or not be indexed at all.
This is a site-wide effect, not just a page-level one. If a site has a large proportion of low-quality pages, Google may apply conservative indexing thresholds across the whole domain. Improving the overall quality of a site’s content tends to improve indexing rates for new pages.
Google’s Page Indexing report
The Page Indexing report in Google Search Console (Indexing > Pages) shows every URL Google has discovered on a site, split between indexed and not indexed. Non-indexed URLs are grouped by reason code.
The most common reason codes and what they indicate:
| Reason code | What it means |
|---|---|
| Crawled, currently not indexed | Google fetched the page but chose not to index it. Content quality is the usual cause. |
| Discovered, currently not indexed | Google found the URL but has not yet crawled it. Often a prioritisation or crawl budget issue. |
| Duplicate without user-selected canonical | Google found duplicates and chose a canonical itself. Add explicit canonicals. |
| Duplicate, Google chose different canonical | Your declared canonical was overridden. Investigate conflicting signals. |
| Excluded by noindex tag | The page has a noindex directive. Intentional or accidental. |
| Alternate page with proper canonical tag | The page canonicals to a different URL. Indexing consolidated there. |
| Blocked by robots.txt | Googlebot could not crawl the page to assess it. |
| Not found (404) | The page does not exist or returns a 404. |
For a full diagnostic walkthrough of each reason code and how to fix it, see the why isn’t my page indexed guide.
Accelerating indexing
Google does not provide a general-purpose mechanism for forcing indexing. The most reliable signals are indirect.
Internal linking. Pages with strong internal links from already-indexed pages are discovered and prioritised for crawling faster. A new page with no internal links may sit undiscovered for weeks. Add links from topically relevant, well-trafficked pages.
XML sitemap. Include new pages in the sitemap and keep it current. The sitemap signals to Google that you consider these URLs important. Submit it via Search Console under Indexing > Sitemaps.
External backlinks. An external link from a high-authority domain is one of the fastest ways to get a new page indexed. When Googlebot crawls the linking page and finds the link, it adds the linked URL to its crawl queue.
URL Inspection request. In Search Console, the URL Inspection tool has a “Request indexing” button. This queues the URL for crawling and is useful for individual high-priority pages after a significant change. It does not guarantee indexing and is rate-limited; it is not a substitute for the signals above.
Indexing API
Google’s Indexing API is a programmatic way to notify Google of URL updates. It is restricted to two content types: JobPosting structured data and BroadcastEvent (livestream) structured data embedded in VideoObject. General-purpose web content cannot be submitted through this API. Attempting to use it for unsupported content types has no effect.
IndexNow
IndexNow is a protocol developed by Microsoft that lets sites notify search engines of URL additions, updates, and deletions in real time, without waiting for the next crawl. It works for any content type and is supported by Bing, Yandex, Naver, Seznam.cz, and Yep.
Google does not currently participate in IndexNow. For sites optimising for Bing alongside Google, IndexNow can accelerate Bing discovery significantly. For Google specifically, the indirect signals above remain the most reliable approach.
Frequently asked questions
Why is my new page not getting indexed? The most common reasons are insufficient internal links (Google has not prioritised crawling it) and content quality concerns (Google crawled it but decided it did not belong in the index). Check the Page Indexing report in Search Console to see the specific reason code for the URL.
How long does indexing take? It varies substantially. High-authority sites with strong internal linking can see new pages indexed within hours. Lower-authority sites with weak internal linking may wait days to weeks. Pages that are crawled but not indexed due to quality signals may never be indexed without editorial changes to the content.
Can I force Google to index a page? No. The URL Inspection request in Search Console queues a page for crawling; it does not guarantee indexing. The underlying signals, content quality, internal links, and authority, determine whether the page is indexed after it is crawled.