llms.txt

llms.txt is a proposed convention for a Markdown file at the root of a domain that provides a clean, AI-readable index of a site’s most important content. It was first proposed by Jeremy Howard in September 2024 and has since been adopted by a growing list of documentation sites, SaaS platforms, and content publishers.

What llms.txt is for

The problem llms.txt addresses is straightforward. Large language models reading a website at inference time face the same challenges a human would: navigation menus, footer noise, pop-ups, advertising, and HTML scaffolding all dilute the signal. Sites are built primarily for browser rendering, not for clean text extraction.

llms.txt provides a curated, Markdown-formatted entry point that says: here is the canonical structure of this site, here are the most important pages, here is what they contain. It is a hint to AI systems about what matters, in a format they can ingest cleanly.

How it differs from robots.txt and sitemap.xml

FilePurposeAudience
robots.txtWhat can be crawledAll crawlers
sitemap.xmlWhat URLs existSearch engine crawlers
llms.txtWhat matters and what it meansLLM-based retrieval and reasoning systems

The three are complementary, not competing. A site can (and ideally should) have all three.

llms.txt syntax

The format is intentionally simple Markdown. A minimal example:

# Site Name

> One-line description of what the site is about and who maintains it.

Optional longer paragraph providing context about the site, its purpose, and any guidance for how AI systems should use the content (attribution, citation preferences, etc.).

## Section heading

- [Page title](https://example.com/page/): Short description of what this page contains.
- [Another page](https://example.com/another/): Description.

## Another section

- [Page](https://example.com/page2/): Description.

Headings group related content. Bullet links describe individual pages. Descriptions are optional but recommended; they give the AI system a hint about what each linked page covers.

llms-full.txt

Some sites also publish llms-full.txt, a longer document that contains the full content of the site (or a curated subset of it) in plain Markdown. The intent is to give AI systems the option of consuming the entire site content in a single fetch rather than crawling each page individually.

This is most useful for documentation sites, where the goal is to enable an LLM to answer detailed questions about a product without partial-context errors caused by retrieving only one page at a time.

When llms.txt makes sense

  • Documentation sites. The clearest use case. A clean Markdown index of API references, guides, and tutorials gives LLMs a much better foundation for accurate answers about your product.
  • Knowledge bases and reference content. Sites whose value lies in providing answers benefit from being easy to retrieve from.
  • Personal and professional sites where authorship matters. Helps LLMs cite the correct author and the correct page when summarising your work.
  • News and editorial sites. Surface curated indexes of high-quality, well-edited content that you want cited rather than the noisy aggregate of every URL.

When it matters less

  • E-commerce product catalogues. Structured data and a clean sitemap are usually more useful for transactional surfaces than llms.txt.
  • High-volume content farms. llms.txt does not solve the underlying quality problem; it just makes content easier to retrieve, which is not always desirable for the publisher.

Adoption status

As of early 2026, llms.txt is not a formal standard. It is a community convention. There is no public documentation from OpenAI, Anthropic, Google, or Perplexity confirming that their crawlers and retrievers actively prefer or weight llms.txt files. Adoption is being driven by publishers who want to make their content easier to use, on the bet that AI systems will increasingly look for and use these files.

The downside of publishing one is minimal. The upside, if and when major retrievers begin treating llms.txt as a preferred surface, is meaningful.

Frequently asked questions

Where should llms.txt live? At the root of the domain: https://example.com/llms.txt. Like robots.txt, by convention.

Does publishing llms.txt help with traditional SEO? No direct effect. It is an AI-targeted surface, not something Googlebot uses for indexing or ranking.

Can I use llms.txt to opt out of AI scraping? No. llms.txt is an opt-in surface for guiding AI systems toward your most important content. Opting out of AI scraping is done via robots.txt directives targeting specific user agents (GPTBot, ClaudeBot, Google-Extended, etc.).

Is there a tool that generates llms.txt automatically? Several open-source generators exist for common static site generators (Astro, Next.js, Jekyll, Hugo). For larger sites, generating llms.txt from your sitemap and content collection is straightforward. The harder work is curating which pages to include and writing useful descriptions.