How Google Crawls, Renders, and Indexes Pages
Last updated
Before a page can rank, Google must complete four sequential steps: discover the URL, crawl the page, render it, and index it. These stages happen in order — failure at any one means the next does not occur. Understanding the pipeline helps diagnose exactly where a technical SEO problem is occurring.
Stage 1: Discovery
Google learns about URLs through:
- Internal links from already-crawled pages — the primary discovery mechanism
- External backlinks from other sites
- XML sitemaps submitted via Search Console
- URL inspection — manual submission for individual urgent pages
Pages with no internal links pointing to them are rarely discovered or crawled consistently, even if they appear in a sitemap. Sitemaps indicate priority; internal links provide discovery. Both matter.
Stage 2: Crawling
Once a URL is discovered, Googlebot fetches it — requesting the page’s HTML from the server, subject to:
- robots.txt — if the URL is disallowed, Googlebot will not fetch it (but may still index the URL if it has inbound links)
- Crawl budget — the number of URLs Google will crawl on your site in a given period; a constraint primarily for large sites (100k+ pages)
- Server response — 5xx errors, timeouts, and slow response times reduce crawl frequency
Crawling only retrieves HTML. It does not execute JavaScript. At this stage, Google sees the raw HTML source — what you’d see with curl or “View Source,” not what renders in a browser.
See Crawlability and robots.txt for the full detail on how crawl access works.
Stage 3: Rendering
After crawling, Google adds the page to a rendering queue. Googlebot renders pages using a headless version of Chrome — executing JavaScript, applying CSS, and building the full DOM the way a browser would.
Key points:
- Rendering is deferred. It does not happen immediately after crawling. There is typically a delay of seconds to days, depending on crawl priority.
- JavaScript-dependent content is invisible at crawl time. If your page renders content via client-side JavaScript that is not in the initial HTML, that content will not appear in the crawled version and will only be available after rendering.
- Server-side rendering (SSR) and static generation avoid this delay. If content is in the HTML at request time, Google sees it immediately on crawl — no rendering queue required.
- Blocked resources affect rendering quality. If Googlebot cannot load your CSS or JavaScript files (blocked via robots.txt or server rules), the rendered page will differ from what users see, affecting mobile usability and content evaluation.
The practical consequence: if your content or navigation depends on JavaScript, Google will eventually see it, but possibly hours or days after crawling, and possibly inconsistently. For content that matters to rankings, server-side rendering is safer.
See JavaScript SEO for the full detail on rendering and its implications.
Stage 4: Indexing
After rendering, Google evaluates whether to add the page to its index. Indexing is not guaranteed even for crawled, rendered pages. Google applies quality signals including:
- noindex directive — a
<meta name="robots" content="noindex">tag orX-Robots-Tag: noindexHTTP header prevents indexing. The page must be crawlable for Google to see this directive — a page blocked in robots.txt cannot be noindexed this way. - Canonical tags — if the page has a canonical pointing to a different URL, Google will consolidate signals to the canonical and may not index this version
- Content quality — thin, duplicate, or low-quality content may be crawled and rendered but excluded from the index at Google’s discretion
- HTTP status codes — 404 and 410 pages are not indexed; 503 pages are treated as temporary and retried
See Indexing and Canonical Tags for the full detail on how indexing decisions are made.
Diagnosing problems by stage
| Symptom | Likely stage | How to confirm |
|---|---|---|
| Page not discovered | Discovery | No URL in Search Console; no internal links |
| Page blocked from crawling | Crawling | robots.txt disallow; URL Inspection shows “blocked by robots.txt” |
| JS content missing | Rendering | View Source vs rendered DOM differ; Google cache shows no JS content |
| Page crawled, not indexed | Indexing | URL Inspection shows “crawled — currently not indexed” |
| Page indexed but not ranking | Post-indexing | Ranking/quality signals, not a pipeline problem |
Common misconceptions
Crawling and indexing are not the same thing. A page can be crawled every day and never indexed. Crawling is access; indexing is a separate editorial decision.
Indexed does not mean ranked. The index contains billions of pages. Being indexed means you are eligible to rank; it does not guarantee any specific position.
robots.txt does not prevent indexing. A disallowed URL can appear in search results if it has inbound links. Google will show the URL without a description. To prevent indexing, use noindex — but the page must be crawlable for Google to read the tag.
Sitemaps do not override crawl signals. Including a URL in your sitemap does not force indexing. It signals that you consider the URL important; Google makes the indexing decision independently.