Indexing and Canonical Tags

Indexing is the process by which search engines store and organise the pages they crawl, making them eligible to appear in search results. A crawled page is not necessarily an indexed page; the gap between the two is one of the most consequential and least understood areas of technical SEO.

Crawled vs indexed

Crawled means a search engine has fetched the page. Indexed means the page has been added to the search engine’s database and is eligible to appear in results. Many pages are crawled but not indexed.

The reasons a page might not be indexed despite being crawled:

  • The page contains a noindex meta tag or HTTP header
  • The page is canonicalised to a different URL
  • The page’s content is duplicate or near-duplicate of an already-indexed page
  • The page is deemed low-quality by Google’s quality systems
  • The site has crawl budget constraints and the page hasn’t yet been processed
  • The page is blocked by robots.txt (so the crawler couldn’t read its content fully)

Search Console’s Page Indexing report exposes the indexing status of every URL Google has discovered and explains why each non-indexed URL was excluded.

Canonical tags

A canonical tag tells search engines which URL is the preferred version when multiple URLs serve substantially the same content.

<link rel="canonical" href="https://example.com/canonical-url/" />

The canonical tag goes in the <head> of every page. It can be:

  • Self-referential. Most pages should have a canonical pointing to themselves. This protects against accidental duplication from URL parameters.
  • Cross-referential. A duplicate page can canonicalise to the original, consolidating ranking signals on the original URL.

When to use canonicals

Self-referential on every indexable page. The default. Removes ambiguity and prevents tracking parameters from creating accidental duplicates.

Cross-referential for genuine duplicates. Print versions, mobile-optimised versions on different URLs, faceted filter combinations that produce identical inventory.

Cross-referential for syndicated content. When the same article is published on multiple sites, the canonical should point to the original. Most major publishers respect this convention; not all platforms allow you to set it.

When NOT to use canonicals

To consolidate different content. Canonicals are for URL variants of the same content. Using them to merge actually-different pages confuses Google and often gets ignored.

To remove pages from the index. If you want a page removed, use noindex, not a canonical to a different page.

To resolve thin content problems. A canonical doesn’t fix thin content; it just hides one symptom. The underlying solution is to consolidate or improve the content itself.

How Google handles canonical signals

Google treats the rel="canonical" tag as a strong hint, not a directive. It also uses other signals to determine the canonical URL:

  • HTTP headers (Link: rel="canonical")
  • 301 redirects
  • Internal linking patterns (which URL gets linked to most)
  • Sitemap inclusions
  • Hreflang relationships
  • HTTPS preference over HTTP
  • URL pattern preferences (shorter URLs, trailing slash conventions)

Google’s selected canonical may differ from your declared canonical when other signals contradict the tag. Search Console’s URL Inspection tool shows both: the user-declared canonical (your tag) and the Google-selected canonical (what Google actually chose).

Common indexing problems

Discovered, currently not indexed. Google found the URL but hasn’t crawled or indexed it yet. Often a sign of crawl budget constraints or low perceived priority. Improve internal linking and external authority to the page.

Crawled, currently not indexed. Google fetched the page but didn’t add it to the index. The most common reason is content quality concerns. Pages flagged this way are usually thin, duplicate, or low-value. Improve the content or remove the page.

Page with redirect. The URL redirects to another URL; the destination is indexed under the destination URL. Usually expected.

Duplicate without user-selected canonical. Google found duplicates and chose a canonical itself because no canonical was declared. Add explicit canonicals.

Duplicate, Google chose different canonical than user. Your declared canonical was overridden. Investigate why; usually a signal that the alternative URL has stronger authority signals.

Excluded by ‘noindex’ tag. The page intentionally has a noindex meta tag. If unintentional, remove the tag.

Blocked by robots.txt. Google can’t crawl the page to see its content. Often paired with indexing of the URL but no description.

Indexing API and IndexNow

Google offers an Indexing API restricted to JobPosting and BroadcastEvent content. Most sites cannot use it.

IndexNow is a Microsoft-led protocol (used by Bing and Yandex) that lets sites notify search engines of URL changes proactively, accelerating discovery. It works for any content type. Google does not currently support IndexNow, but for Bing optimisation it is useful.

For Google, the most reliable ways to accelerate indexing remain:

  • Strong internal linking
  • External backlinks
  • Inclusion in a clean, current sitemap
  • Manual URL submission via Search Console (limited use; for individual high-priority URLs)

Frequently asked questions

Why is my new page not getting indexed? The most common reason is insufficient signals: weak internal links, no external links, not in the sitemap. Strengthen those signals. If the page has been live for weeks and is still not indexed, content quality concerns are the next thing to investigate.

Should I canonicalise my homepage to itself? Yes. Self-referential canonicals on every page including the homepage are the safe default.

Can I have multiple canonical tags on a page? No. Multiple rel="canonical" tags confuse Google and one will be picked arbitrarily. Always have exactly one.