Structured Data and Schema Markup

Last updated 18 July 2026

Structured data and schema markup are machine-readable metadata embedded in a webpage that explicitly declare the meaning of its content. They use the schema.org vocabulary, maintained jointly by Google, Microsoft, Yahoo, and Yandex, and are most commonly delivered as JSON-LD. Implemented correctly, schema markup drives rich results, improves AI parsing, and reinforces entity and topic associations.

What is the difference between structured data, schema, and semantic HTML?

These three terms get used interchangeably, but they refer to different things.

Structured data is the concept: organising information so machines can interpret it reliably. It is not a format or a tool: it is the goal.

Schema markup is one way to implement it: a standardised vocabulary from schema.org, delivered as JSON-LD, that explicitly labels what the content on a page is. That is what this article covers.

Semantic HTML is another way to implement it: native HTML elements (<article>, <section>, <h2>) that communicate meaning and structure through the document itself, without a separate vocabulary layer.

Both schema and semantic HTML serve machine interpretation, but they operate differently. Schema labels entities and properties explicitly. Semantic HTML defines the document structure that machines (including AI retrieval systems) use to identify and chunk content. Neither replaces the other.

What does structured data do?

The HTML on a page describes how content should be displayed. Structured data describes what the content actually means. A <p> tag containing “Dr. Sarah Wilson” tells a browser to render the text in paragraph style; a Person schema with name “Dr. Sarah Wilson” and a knowsAbout array tells search engines that this string refers to a specific real person with declared expertise.

Schema also resolves ambiguity. A page about “Mercury” can declare itself as being about the planet (Place), the chemical element (ChemicalSubstance), or the band (MusicGroup), removing ambiguity for parsers before they have to infer it from surrounding text.

The benefits cascade through several systems:

Search engines use structured data to power rich results (star ratings, recipe cards, product information, event details, breadcrumbs).
AI retrieval systems (Bing/Copilot, Google AI Overviews, Perplexity) use it to extract metadata about authors, dates, publishers, and content type at index time.
Knowledge graphs use it to construct entity relationships across the web.
Voice assistants use it for spoken responses.

Implementation: JSON-LD as the standard

Three formats are valid for structured data: JSON-LD, Microdata, and RDFa. Use JSON-LD. It is Google’s recommended format, sits in a separate <script> block independent of HTML structure, and is significantly easier to maintain than the alternatives.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Structured Data and Schema Markup",
  "datePublished": "2026-04-25",
  "author": {
    "@type": "Person",
    "@id": "https://example.com/#author"
  }
}
</script>

JSON-LD blocks can be placed in <head> or <body>. <head> is conventional. Multiple JSON-LD blocks per page are permitted; a common pattern is one Article block, one BreadcrumbList block, plus any rich-result-specific blocks.

Component pattern. Build a reusable schema component (Astro, React, Vue) that takes a JSON object and emits it as a <script type="application/ld+json">. Each page builds its own schema object based on its content, keeping schema in sync with page data without manual updates.

Which schema types should I implement?

Types grouped by site model. Where a type’s rich result eligibility has changed, that is noted: both FAQPage and HowTo retain value for AI extraction despite losing their Google Search rich result formats.

Editorial and publication sites: Article (or NewsArticle, TechArticle), Person (for authors), Organization (for publisher), BreadcrumbList.

E-commerce: Product, Offer, AggregateRating, Review, BreadcrumbList, Organization. HowTo on product-related instruction pages.

Local businesses: LocalBusiness (or a subtype such as Restaurant, Dentist, Plumber), PostalAddress, OpeningHoursSpecification, AggregateRating.

SaaS and software: SoftwareApplication, Organization, AggregateRating, Review, BreadcrumbList.

Personal sites and portfolios: Person, CreativeWork, AboutPage, BreadcrumbList.

Recipe sites: Recipe (with ingredients, cookingMethod, nutrition), AggregateRating, Review.

Event sites: Event (with location, performer, offers), Place.

Video-heavy sites: VideoObject (with thumbnailUrl, uploadDate, duration, description, and contentUrl or embedUrl) embedded on the page hosting the video. This is what makes videos eligible for Google’s video carousel and video rich results in Search. Without it, Google may still index the video but won’t show the enhanced result, treat it as a supporting page element rather than indexing it independently, or fail to interpret it cleanly at all. Each VideoObject should sit on the page where the video is embedded, not on a central video index page.

Two optional additions extend what the video can earn. Key Moments (the timestamped chapter links shown below a video result) come from either Clip markup, where you specify exact start and end times and labels manually, or SeekToAction, which tells Google your player supports URL-based seeking so it can generate the moments automatically. A video sitemap (the <video:video> extension to your XML sitemap, carrying title, description, thumbnail, content URL, and publish date) serves a different job from the schema: the schema describes the video to Google, while the sitemap helps Google discover videos on pages that are not well linked. They are complementary, and large video libraries benefit from both; a handful of embedded videos on well-linked pages can rely on the schema alone.

FAQPage. Apply to pages with a genuine FAQ section. Google removed FAQ rich results from Search in May 2026¹ (restrictions had already limited eligibility to government and health sites since 2023²). FAQPage schema no longer produces a visible result in Google Search but may retain value for AI extraction (see Schema and AI search below for the caveats).

HowTo. For pages with sequential instructional content. HowTo rich results were deprecated in September 2023 and no longer appear in Google Search.² The schema retains value for AI extraction.

WebSite (with SearchAction). (Deprecated) Google’s Sitelinks Search Box (the visual search input shown in branded SERP results) was deprecated and no longer appears in search results.³ The SearchAction property on WebSite schema retains some relevance for agentic and AI-powered search systems that use it to understand site search capabilities, but it produces no visual rich result in Google Search.

The @id graph pattern

For sites with multiple schema entities (Person, Organization, WebSite), the most powerful pattern is to give each entity a stable @id and reference them across pages.

{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": "Person",
      "@id": "https://example.com/#person",
      "name": "Author Name"
    },
    {
      "@type": "WebSite",
      "@id": "https://example.com/#website",
      "author": { "@id": "https://example.com/#person" }
    },
    {
      "@type": "Article",
      "@id": "https://example.com/article/#article",
      "author": { "@id": "https://example.com/#person" },
      "isPartOf": { "@id": "https://example.com/#website" }
    }
  ]
}

Cross-referencing the same @id across every article an author writes gives knowledge graph systems an explicit signal linking content to a person, regardless of whether that relationship is stated in prose. This is especially valuable for personal brands and organisations that have not yet established Wikipedia or Wikidata presence.

Link schema types together with @id to build an entity graph

Rather than writing standalone schema blocks, use the @id property to connect related types: an Article references its WebPage, which is authored by a Person, who belongs to an Organisation, which owns a Website. This creates a coherent entity graph that allows Google to understand the relationships between your content, its authors, and your business. Linked schema is more interpretable than isolated blocks and gives your structured data a better chance of populating Knowledge Panel entries and rich results accurately.

How do you validate structured data?

Two tools to use before shipping any schema change:

Schema Markup Validator. Validates against the schema.org specification. Catches structural errors.
Google Rich Results Test. Validates against Google’s specific requirements for rich result eligibility.

Both should pass. The Schema Markup Validator alone is not sufficient because Google has additional requirements (image sizes, required properties for specific rich results) that go beyond the base schema.org spec.

Schema and AI search

The picture is more contested than most writing on the topic suggests.

For retrieval-augmented systems (Bing/Copilot, Google AI Overviews, Perplexity) there is platform-level confirmation that structured data is used. Fabrice Canel of Microsoft confirmed in March 2025 that schema helps Bing’s systems understand content.⁴ Google’s AI Overviews are built on years of structured data investment. In these pipelines, schema can influence how content is classified and retrieved.

For pure LLM inference, the evidence is weaker. The only empirical study to date (Search Atlas, December 2025) found no correlation between schema coverage and citation rates across OpenAI, Gemini, and Perplexity.⁵ One reason: JSON-LD lives in <script> tags, which preprocessing pipelines typically strip before LLM training, meaning schema may never enter the model’s weights at all.

What this means in practice: schema is worth implementing for the retrieval-augmented use cases where its benefit is confirmed, and for entity reinforcement via @id. Treating it as a direct LLM citation lever, without evidence for a specific platform, overstates what is known.

Can LLMs read schema?

Information declared only in your JSON-LD may not reach LLMs the way you intend. LLMs read <script> blocks as plain text, not as validated metadata, so a model can ingest whatever a block says without checking it: a block with fake types or invalid properties reads to the model exactly like a correct one. If your author name, expertise, or key claims matter, state them in the prose too. Schema is worth implementing for rich results, Bing/Copilot, and entity recognition, but it should reinforce what the page already says, not substitute for it.

Common technical mistakes

Schema describing content not visible on the page. Adding FAQ schema for questions not actually shown to users is a guidelines violation that has resulted in manual actions. The general rule: if the user cannot see it, do not mark it up.

Conflicting @type values across blocks. A page with two competing Article schemas confuses parsers. Have one Article block per page.

Missing required properties. Each schema type has required properties (Article needs headline, Recipe needs name, Product needs name and image). Missing requireds make the schema invalid; Google ignores it entirely.

Using string IDs that aren’t URLs. @id should be a URL (URI). String IDs like "@id": "author-1" are invalid and don’t participate in graph relationships.

Schema only on some pages. Inconsistent application across the site fragments the entity graph. Apply schema systematically.

Stale dateModified. Auto-incrementing the dateModified on every build (a common mistake with static-site generators) destroys the freshness signal. Set it from the actual content modification date.

Maintenance and monitoring

Schema implementations drift. Field changes to schema types, framework upgrades that change rendering, and CMS template edits can all silently break schema. Monitoring approaches:

Search Console > Enhancements. Reports rich result eligibility and errors per schema type.
Periodic spot-checks. Run the Rich Results Test on a sample of important URLs quarterly.
Crawl-based audits. Screaming Frog and Sitebulb extract structured data from every URL during a crawl, surfacing missing or invalid schema across the site.

Frequently asked questions

Does structured data improve rankings directly?
No. The indirect effects (rich results, better CTR, clearer topic signals, AI citation) move rankings.

How do I know which schema types my site is eligible for?
Google’s search gallery lists current rich result types and their requirements. Not all schema types produce rich results; those that don’t still serve indexing and AI purposes.

Can structured data be added retrospectively to old content?
Yes. Adding schema to existing pages is a frequent quick-win SEO project. Pages that previously had no schema often gain rich result eligibility within weeks of implementation.

How much schema is too much?
Schema should describe the page accurately and completely. There is no penalty for detailed markup, provided every schema block reflects real on-page content.

Structured Data and Schema Markup

What is the difference between structured data, schema, and semantic HTML?

What does structured data do?

Implementation: JSON-LD as the standard

Which schema types should I implement?

The @id graph pattern

How do you validate structured data?

Schema and AI search

Common technical mistakes

Maintenance and monitoring

Frequently asked questions

Guides, Checklists & References

New Website SEO Guide

The SEO Audit Guide

Core Web Vitals Optimisation Guide

SEO Go-Live Checklist

robots.txt Reference

Site Migration Guide

SEO Glossary

SEO News + Updates

OpenAI Retires ChatGPT Atlas, Folding Agentic Browsing Into ChatGPT

Google revamps Image Search and brings image generation into AI Overviews

ChatGPT Citations Shift When Its Hidden Search Pipelines Switch

Cloudflare Splits AI Crawlers Into Search, Agent and Training, With Default Blocks From 15 September

Google Search Console Adds Platform Properties for Social and Video Content

What is the difference between structured data, schema, and semantic HTML?

What does structured data do?

Implementation: JSON-LD as the standard

Which schema types should I implement?

The @id graph pattern

How do you validate structured data?

Schema and AI search

Common technical mistakes

Maintenance and monitoring

Frequently asked questions

Footnotes

See also

Rich Results

Entity SEO

Semantic HTML

Guides, Checklists & References

New Website SEO Guide

The SEO Audit Guide

Core Web Vitals Optimisation Guide

SEO Go-Live Checklist

robots.txt Reference

Site Migration Guide

SEO Glossary

SEO News + Updates

OpenAI Retires ChatGPT Atlas, Folding Agentic Browsing Into ChatGPT

Google revamps Image Search and brings image generation into AI Overviews

ChatGPT Citations Shift When Its Hidden Search Pipelines Switch

Cloudflare Splits AI Crawlers Into Search, Agent and Training, With Default Blocks From 15 September

Google Search Console Adds Platform Properties for Social and Video Content