How Google Crawls, Renders, and Indexes Pages

Last updated 27 April 2026

Before a page can rank, Google must complete four sequential steps: discover the URL, crawl the page, render it, and index it. These stages happen in order — failure at any one means the next does not occur. Understanding the pipeline helps diagnose exactly where a technical SEO problem is occurring.

Stage 1: Discovery

Google learns about URLs through:

Internal links from already-crawled pages — the primary discovery mechanism
External backlinks from other sites
XML sitemaps submitted via Search Console
URL inspection — manual submission for individual urgent pages

Pages with no internal links pointing to them are rarely discovered or crawled consistently, even if they appear in a sitemap. Sitemaps indicate priority; internal links provide discovery. Both matter.

Stage 2: Crawling

Once a URL is discovered, Googlebot fetches it — requesting the page’s HTML from the server, subject to:

robots.txt — if the URL is disallowed, Googlebot will not fetch it (but may still index the URL if it has inbound links)
Crawl budget — the number of URLs Google will crawl on your site in a given period; a constraint primarily for large sites (100k+ pages)
Server response — 5xx errors, timeouts, and slow response times reduce crawl frequency

Crawling only retrieves HTML. It does not execute JavaScript. At this stage, Google sees the raw HTML source — what you’d see with curl or “View Source,” not what renders in a browser.

See Crawlability and robots.txt for the full detail on how crawl access works.

Stage 3: Rendering

After crawling, Google adds the page to a rendering queue. Googlebot renders pages using a headless version of Chrome — executing JavaScript, applying CSS, and building the full DOM the way a browser would.

Key points:

Rendering is deferred. It does not happen immediately after crawling. There is typically a delay of seconds to days, depending on crawl priority.
JavaScript-dependent content is invisible at crawl time. If your page renders content via client-side JavaScript that is not in the initial HTML, that content will not appear in the crawled version and will only be available after rendering.
Server-side rendering (SSR) and static generation avoid this delay. If content is in the HTML at request time, Google sees it immediately on crawl — no rendering queue required.
Blocked resources affect rendering quality. If Googlebot cannot load your CSS or JavaScript files (blocked via robots.txt or server rules), the rendered page will differ from what users see, affecting mobile usability and content evaluation.

The practical consequence: if your content or navigation depends on JavaScript, Google will eventually see it, but possibly hours or days after crawling, and possibly inconsistently. For content that matters to rankings, server-side rendering is safer.

See JavaScript SEO for the full detail on rendering and its implications.

Stage 4: Indexing

After rendering, Google evaluates whether to add the page to its index. Indexing is not guaranteed even for crawled, rendered pages. Google applies quality signals including:

noindex directive — a <meta name="robots" content="noindex"> tag or X-Robots-Tag: noindex HTTP header prevents indexing. The page must be crawlable for Google to see this directive — a page blocked in robots.txt cannot be noindexed this way.
Canonical tags — if the page has a canonical pointing to a different URL, Google will consolidate signals to the canonical and may not index this version
Content quality — thin, duplicate, or low-quality content may be crawled and rendered but excluded from the index at Google’s discretion
HTTP status codes — 404 and 410 pages are not indexed; 503 pages are treated as temporary and retried

See Indexing and Canonical Tags for the full detail on how indexing decisions are made.

Diagnosing problems by stage

Symptom	Likely stage	How to confirm
Page not discovered	Discovery	No URL in Search Console; no internal links
Page blocked from crawling	Crawling	robots.txt disallow; URL Inspection shows “blocked by robots.txt”
JS content missing	Rendering	View Source vs rendered DOM differ; Google cache shows no JS content
Page crawled, not indexed	Indexing	URL Inspection shows “crawled — currently not indexed”
Page indexed but not ranking	Post-indexing	Ranking/quality signals, not a pipeline problem

Common misconceptions

Crawling and indexing are not the same thing. A page can be crawled every day and never indexed. Crawling is access; indexing is a separate editorial decision.

Indexed does not mean ranked. The index contains billions of pages. Being indexed means you are eligible to rank; it does not guarantee any specific position.

robots.txt does not prevent indexing. A disallowed URL can appear in search results if it has inbound links. Google will show the URL without a description. To prevent indexing, use noindex — but the page must be crawlable for Google to read the tag.

Sitemaps do not override crawl signals. Including a URL in your sitemap does not force indexing. It signals that you consider the URL important; Google makes the indexing decision independently.