NLWeb

NLWeb is a server-side complement to the browser-level agent protocols described in WebMCP. Where WebMCP exposes callable tools in the browser at the moment a user’s agent takes action, NLWeb creates a queryable API layer on a site using structured data the site already publishes. It was released as open source by Microsoft in May 2025, developed by R.V. Guha, who previously created RSS, RDF, and Schema.org, and has since been integrated into Cloudflare’s AI Search product.12

What does NLWeb do?

NLWeb takes the structured data a site already publishes (Schema.org JSON-LD, RSS, other JSONL formats) and makes it queryable by AI agents through two endpoints:

  • /ask: a REST endpoint that accepts natural language queries and returns structured answers in Schema.org format
  • /mcp: an MCP server endpoint that makes the site discoverable and callable by any MCP-compatible agent

Every NLWeb instance is an MCP server. A site running NLWeb can be connected directly by the same agents that connect to tools from Anthropic, OpenAI, and Google, without those agents having to crawl individual pages.

The processing pipeline: a natural language query arrives at /ask, the server retrieves relevant schema.org entities from a vector database, an LLM ranks and filters the results, and a structured Schema.org response is returned. The agent never reads individual HTML pages.

What this requires in practice: a vector database (Qdrant, Snowflake, Milvus, and Azure AI Search are all tested), LLM connectivity (the protocol works with most major models including Deepseek, Gemini, and Claude), and the NLWeb server itself (open source, available on GitHub).3 Cloudflare’s AI Search integration removes the vector database requirement for sites already on Cloudflare.2

How does NLWeb differ from WebMCP?

The two protocols address the same question (how can AI agents interact with a website?) from different positions.

NLWebWebMCP
LayerServer-side APIBrowser-native
Initiated bySite publishing an endpointChrome’s navigator.modelContext API
What it enablesNatural language queries against site contentAgent invocation of declared JS/form actions
Data sourceExisting schema.org / RSS / JSON-LDDeclarative HTML forms and JavaScript APIs
Governed byMicrosoft open sourceW3C Community Group (Google + Microsoft)
StatusLive, open source, 4,800+ GitHub starsDraft W3C spec, Chrome 146+ flag

Both produce an MCP server endpoint; neither replaces the other. A site running NLWeb exposes a queryable catalogue. A site implementing WebMCP exposes callable actions (add to cart, check availability, book a service). The two are complementary layers in the same agentic stack.

What does an NLWeb query look like?

A user’s AI agent sends a POST request to a travel site’s /ask endpoint:

{ "query": "beach resorts in Thailand with kids activities under £200" }

The server’s vector database retrieves relevant LodgingBusiness schema.org entities. The LLM ranks and filters them. The endpoint returns a Schema.org-formatted JSON response the agent can present directly. The agent did not crawl individual hotel pages; it queried a structured catalogue.

The same pattern applies to recipe sites (query: “pasta dishes under 30 minutes”), media archives (query: “long reads about climate policy published in 2024”), and documentation (query: “how to configure webhook retries”).

Which sites should implement NLWeb?

Value scales with the depth of the content catalogue and the degree to which users (or agents) need to browse or filter it rather than retrieve individual pages.

Good candidates:

  • Commerce sites. Product catalogue queries are the primary use case: searchProducts, checkAvailability, price filtering. TripAdvisor and Shopify are named early adopters.4
  • Media and publisher sites with large archives. Deep catalogues of recipes, reviews, or long-form content benefit from queryable interfaces. Named early adopters include DDM (Allrecipes, Serious Eats), O’Reilly Media, and Hearst (Delish).4
  • Documentation and knowledge bases. AI coding tools benefit from queryable structured responses when working with API references and guides, the same use case that motivated Google’s Markdown pages on developers.google.com.
  • Events and ticketing. Eventbrite is a named adopter; date and location queries against a live events catalogue are a natural fit.4

Not a good fit:

  • General editorial and informational sites. If the site’s primary value is individual article retrieval rather than catalogue browsing, agentic SEO content optimisation is the more relevant investment. NLWeb infrastructure does not improve how individual articles are found or cited in AI-generated answers.
  • Sites with thin or unreliable schema.org markup. NLWeb’s responses are only as good as the structured data underneath. Poor schema produces inaccurate or unhelpful NLWeb responses. Audit and strengthen schema first.

Does NLWeb affect search rankings?

No confirmed mechanism. NLWeb exposes a queryable API layer; it does not affect how search engines crawl, index, or rank the site. The value is direct agent access to site content, not improved search visibility. Do not implement NLWeb expecting SEO or AI citation benefits.

How effective is NLWeb vs. standard HTML crawling?

An academic comparison published on ArXiv in November 2025 tested MCP-based retrieval, NLWeb, RAG, and raw HTML against the same query set.5 NLWeb and MCP retrieval achieved F1 scores of 0.75–0.77 versus 0.67 for raw HTML, while reducing token usage from roughly 241,000 tokens (HTML) to 47,000–140,000 tokens. The caveat: this is a controlled benchmark, not a production deployment study. Production results vary with schema quality and query type.

Current state

NLWeb was announced at Microsoft Build in May 2025 and has been in active development since.1 As of May 2026:

  • GitHub: 4,800+ stars, actively maintained, with R.V. Guha among the direct contributors3
  • Cloudflare integration: Cloudflare’s AI Search product supports NLWeb deployment, removing the need for separate vector database infrastructure for sites already on Cloudflare2
  • Named production deployments: TripAdvisor, Shopify, Eventbrite, O’Reilly Media, DDM (Allrecipes/Serious Eats), Hearst (Delish), Chicago Public Media, Common Sense Media4
  • Protocol compatibility: works with most major LLMs and vector databases; every NLWeb instance exposes an MCP endpoint

The protocol is open-source and has no single controlling vendor beyond Microsoft’s initial development and hosting of the repository.

What to do now

Commerce and media sites: assess whether your Schema.org coverage is solid enough to serve as the data source. A schema audit should come before any NLWeb implementation; the quality of your structured data determines the quality of NLWeb responses. Sites already on Cloudflare can review the Cloudflare AI Search + NLWeb documentation as the lowest-friction path to deployment.

Documentation sites: NLWeb is worth evaluating if your audience includes AI coding tools that read documentation to generate code. The structured query interface fits how these tools consume reference material.

Editorial and informational sites: no implementation needed. Monitor as platform-level integrations mature (Cloudflare’s integration is already a sign that NLWeb may become a hosting-layer default rather than a DIY implementation). The relevant investment remains content quality and agentic SEO optimisation.

Frequently asked questions

Does NLWeb require rebuilding the site?
No. NLWeb reads existing structured data: Schema.org JSON-LD, RSS, JSONL. The quality of your schema determines the quality of NLWeb responses. The implementation work is infrastructure (vector database, LLM connectivity, NLWeb server), not content restructuring.

Is NLWeb the same as MCP?
Related but distinct. Every NLWeb instance is an MCP server: it exposes an /mcp endpoint that MCP-compatible agents can connect to. NLWeb adds the /ask REST interface and the schema.org retrieval pipeline on top. An MCP server is not necessarily an NLWeb instance; an NLWeb instance is always also an MCP server.

Does Google support NLWeb?
Microsoft developed and maintains NLWeb; Google has not announced adoption. NLWeb and WebMCP are complementary (both eventually produce MCP endpoints) but have separate governance and different implementation paths.

Is NLWeb a standard or a product?
It is an open-source protocol. Microsoft published the code and the specification on GitHub under an open-source licence. Any organisation can implement NLWeb without licensing fees or vendor lock-in. The protocol’s design builds on established web standards (Schema.org, RSS, MCP) rather than proprietary formats.

How does NLWeb relate to llms.txt?
They address different layers. llms.txt is a discovery hint: a Markdown index of what a site contains. NLWeb is an action layer: a queryable API that returns structured answers. A site could publish both; they do not conflict. See llms.txt for the index convention, and note that Google’s AI optimisation guide confirms llms.txt has no effect on search visibility.

Footnotes

  1. Introducing NLWeb: bringing conversational interfaces directly to the web — Microsoft News 2

  2. Get started with NLWeb — Cloudflare AI Search documentation 2 3

  3. NLWeb — GitHub 2

  4. NLWeb pioneers: success stories and use cases — Microsoft Tech Community 2 3 4

  5. MCP vs RAG vs NLWeb vs HTML: a comparison — ArXiv