AEO and GEO learning center

The llms.txt file

What the llms.txt convention is, what it does and does not do, and how to decide whether publishing one is worth the effort for your brand.

By the AI Native team · Updated 2026-06-11

The llms.txt file is a plain-text convention for telling AI systems what a site contains. It is placed at yourdomain.com/llms.txt and lists pages or content in a format designed to be read by a language model rather than a traditional web crawler. The idea is that a site can offer a curated, context-rich summary of itself so that an AI agent fetching the file gets a cleaner picture than it would from crawling HTML pages one at a time.

This guide covers what the file does, where it helps, where it does not, and how to decide whether publishing one belongs on your near-term list.

What the convention says

A llms.txt file contains a short markdown-formatted description of the site, followed by a list of the pages most relevant for a language model to read. It can include links to an llms-full.txt variant that contains the full plain-text content of selected pages, stripped of navigation, ads, and other HTML scaffolding that LLMs do not need.

The structure is intentionally simple: a brief statement of what the site is, then a set of markdown sections with links grouped by type (documentation, guides, API reference, and so on). The format was proposed by Jeremy Howard in late 2024 and has been adopted by a growing set of developer and documentation sites.

What it helps with

The file is most useful in two scenarios.

The first is agentic use. When a software agent or AI assistant is actively fetching pages to answer a question or complete a task, a well-formed llms.txt file tells the agent which pages are worth reading rather than making it discover them through a sitemap or crawl. This matters most for developer tools, documentation, and any brand where AI agents might interact programmatically with the site.

The second is content clarity. The act of writing the file forces a site to produce a clean, prose-based summary of what it is and what it contains. That summary, if the file is indexed and retrieved, can improve the quality of the parametric understanding an AI engine builds about the site over time.

What it does not do

The file does not directly improve how a conversational AI answers a buyer's question. ChatGPT, Gemini, and Perplexity do not fetch llms.txt for every search query. They retrieve from indexed web pages using their search layer, not from a declared manifest file. So publishing llms.txt does not substitute for having well-written, clearly structured pages that rank and get retrieved.

It also does not fix parametric memory directly. The file might be indexed and become part of a future training corpus, but that is speculative and slow. The faster path to parametric improvement is consistent, accurate description of your brand across trusted third-party sources.

For most marketing teams, the conversational AI answer engines that buyers use day-to-day are less likely to benefit from a llms.txt file than the agentic and developer tools that explicitly support the convention.

Who benefits most right now

Developer-facing products and documentation sites are the highest-value case today. Frameworks, APIs, and developer tools are already being consumed by AI coding assistants and agentic tools that look for llms.txt to understand the product. If your brand has a developer segment or a technical documentation layer, publishing the file is straightforward and has a real audience.

Brands whose AI-visibility gap sits in conversational answer engines used by general consumers get less near-term return from the file alone. The work there is on page quality, structured data, and off-page mentions, which are described in the related guides.

How to create one

The file is a plain-text markdown file at yourdomain.com/llms.txt. A minimal version contains:

  • A short paragraph (two to four sentences) describing the site and its purpose.
  • A section for the most important pages, grouped logically, with a short description of each link.
  • Optionally, a pointer to llms-full.txt if you are also publishing full page content in stripped form.

The main discipline is keeping the descriptions precise. Vague summaries produce vague understanding. Precise descriptions of what each page covers, who it is for, and what question it answers give an AI reader a cleaner map of your site.

After publishing, verify the file is accessible and not blocked by your robots.txt. Some robots rules that block AI crawlers also inadvertently block llms.txt retrieval.

Relation to the wider AI visibility picture

Publishing llms.txt is a low-cost signal with a modest, targeted impact. It belongs on the list of AI-readiness actions but is not the first one for most brands. The sequence that tends to matter more is: fix page quality and answer directness, add structured data, earn citations on trusted sources, then add llms.txt and an llms-full.txt as the developer and agentic layer on top.

If your measurement shows that AI agents are interacting with your site but producing inaccurate answers about it, llms.txt is a plausible early action because you can publish it in an afternoon. If your gap is in how conversational AI answers questions for general buyers, the page-quality and off-page levers are higher priority.

See the how to get cited by AI engines guide for the content and structural moves that affect the retrieval stage.

Questions

What is llms.txt and where does it live?

It is a plain-text markdown file placed at the root of a domain, at yourdomain.com/llms.txt. It describes the site and lists the pages most useful for an AI system to read, in a format that is cleaner than raw HTML for a language model to parse.

Does publishing llms.txt improve how ChatGPT or Gemini answers questions about my brand?

Not directly, for conversational queries. Answer engines retrieve from search-indexed pages at query time, not from a manifest file. The file is more useful for agentic tools and AI coding assistants that explicitly look for it when exploring a site programmatically.

Does the file need to be submitted anywhere?

No formal submission is required. It should be accessible at the root of the domain, unblocked by robots.txt. AI agents and tools that support the convention will discover it by convention. There is no registry or index to register with.

Is there a size limit for llms.txt?

There is no enforced limit, but the convention is to keep the main file concise: a short description and a curated list of important pages. An overly long llms.txt is harder for an agent to use as a quick overview. Longer content belongs in llms-full.txt or in the individual pages themselves.

Does blocking AI crawlers in robots.txt affect llms.txt?

Yes. If your robots.txt blocks the crawlers used by AI agents, those agents cannot read your llms.txt either. Check that your robots rules do not inadvertently block the file if you intend it to be read.

Who first proposed llms.txt?

The convention was proposed by Jeremy Howard in late 2024 as a pragmatic standard for helping AI systems understand website structure and content. It is not an official standard adopted by a standards body; it is a community convention with growing adoption among developer and documentation sites.

Should I also publish llms-full.txt?

An llms-full.txt file contains the full plain-text content of your key pages, stripped of HTML scaffolding. It is useful if you want to give AI agents the deepest possible view of your content without requiring them to parse individual HTML pages. For most brands, it is a secondary step after the main llms.txt is published and the key pages are in good shape.

Back to AEO and GEO learning center or the documentation hub.