Optimizing Your Site for AI Agents and LLMs
Your site has human visitors and AI visitors. Here is how to serve both, with llms.txt, inline LLM instructions, structured data, and machine-readable feeds.
Originally published at agnelnieves.com.

Your site has two audiences now. Humans, obviously. But also AI agents, the LLMs that crawl, summarize, cite, and recommend content to millions of people. If your site is not optimized for both, you are leaving visibility on the table.
I just finished optimizing my own site for AI consumption, and the process surfaced something worth naming up front: most of what makes a site good for AI also makes it better for humans. Clear structure, machine-readable content, and explicit metadata benefit everyone.
Here is what I did, in the order I would do it again, and why each piece matters.
What AI Agents Are Actually Doing With Your Site
When someone asks ChatGPT, Claude, Perplexity, or Google's AI Overview a question, those systems do not just generate answers from training data. Increasingly, they fetch and cite live web content. Your site might get:
- Crawled for training data by bots like GPTBot, ClaudeBot, and Google-Extended
- Fetched at query time by Perplexity, ChatGPT browsing, and similar agents
- Cited as a source in AI-generated responses
- Summarized in featured snippets and AI overviews
- Navigated by autonomous agents that interact with your APIs
Each of those has different needs. They all benefit from the same foundation: structured, discoverable, machine-readable content.
The llms.txt Standard
The llms.txt spec is the equivalent of robots.txt for AI agents. Where robots.txt tells crawlers what they can access, llms.txt tells them what your site is. It is a structured markdown index, served at your domain root, written for a reader that is happier with markdown than with HTML.
The format is simple:
# Your Name or Site
> A one-line summary of what this site is.
A longer description paragraph.
## Section Name
- [Link Title](https://url): Description of what is at this link
I implemented two variants:
/llms.txtis the index. A table of contents with links to all pages, blog posts, profiles, and feeds. Think of it as a menu for AI agents to browse selectively./llms-full.txtis the full dump. Every blog post's complete markdown, every project description, biographical context. For agents that want everything in context at once.
Both are served as text/plain with markdown formatting. Both are generated dynamically from the same data sources that power the site, so they never go stale.
Inline LLM Instructions in HTML
This one comes from a Vercel proposal and it is clever: embed AI-readable instructions directly in the page <head> using a script tag that browsers ignore.
<script type="text/llms.txt">
# Your Site Name
This is the personal site of [name], a [role] based in [location].
## Site Structure
- /: Home, brief description
- /blog: Blog, brief description
- /about: About, brief description
## Key Facts
- Name: Your Name
- Role: Your Role
- Specialties: Thing 1, Thing 2, Thing 3
</script>
Browsers skip <script> tags with unknown types. LLMs process them. It is a zero-cost way to give every page on your site a machine-readable context block. I added one to my root layout that describes who I am, the site structure, and where to find the machine-readable content.
Structured Data That AI Engines Actually Use
JSON-LD structured data has always been important for Google. It is now equally important for AI engines. When an LLM encounters schema.org markup, it understands the semantics of your content. Not just the text, but what the text represents.
I already had structured data for my blog posts (BlogPosting schema with breadcrumbs). What I added was CreativeWork schema for my project pages, giving each project a machine-readable identity:
{
"@context": "https://schema.org",
"@type": "CreativeWork",
"name": "Project Name",
"description": "What this project is",
"url": "https://project-url.com",
"creator": {
"@type": "Person",
"name": "Your Name"
}
}
The more schema types you cover, the more AI engines can understand and cite your work with proper attribution.
Machine-Readable Feeds
RSS is great, but it is XML, which is not the most natural format for AI agents to parse. I added a JSON Feed endpoint alongside my existing RSS feed:
/feed.xml: RSS 2.0 for traditional feed readers/feed.json: JSON Feed 1.1 for programmatic consumption
JSON Feed is cleaner for AI agents to parse and reference. Both are registered in the site's metadata so they are auto-discoverable.
Making robots.txt AI-Aware
Most sites already have a robots.txt. The key addition is explicitly allowing AI crawlers and pointing them to your llms.txt:
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /
Sitemap: https://yoursite.com/sitemap.xml
# AI/LLM Content
# llms.txt: https://yoursite.com/llms.txt
# llms-full.txt: https://yoursite.com/llms-full.txt
Many sites block AI crawlers by default. If you want your content cited and discovered by AI, explicitly allow the major bots: GPTBot, ChatGPT-User, Google-Extended, ClaudeBot, anthropic-ai, PerplexityBot, Applebot-Extended, Bytespider, and cohere-ai.
Why This Matters
I have spent the last few years watching organic SEO traffic dry up while consulting with marketing teams who keep seeing the same dashboards stay green for one more quarter. SEO has evolved from keyword stuffing to semantic web to AI-native citation. We are at an inflection point. The sites that get cited by AI are not necessarily the ones with the best domain authority. They are the ones with the clearest, most structured, most machine-readable content.
This matters for small publications and personal sites in particular. When someone asks an AI "what is a good article on AEO?" or "who writes useful stuff about prompts?", you want your site to be in the answer. That requires more than good content. It requires content that AI can find, understand, and attribute.
The Full Stack of AI Optimization
Here is the complete checklist of what I now have in place:
| Layer | What | Why |
|---|---|---|
robots.txt | Explicitly allow AI bots | Let them crawl |
sitemap.xml | Dynamic sitemap with all content | Let them discover |
llms.txt | Markdown index of the site | Let them understand structure |
llms-full.txt | Full content in one file | Let them ingest everything |
Inline <script> | Page-level LLM instructions | Let them understand context |
| JSON-LD | Structured data on every page | Let them understand semantics |
| RSS + JSON Feed | Machine-readable content feeds | Let them subscribe |
| Meta tags | OpenGraph, Twitter, canonical | Let them cite accurately |
None of these changes affect how the site looks or feels for human visitors. They are invisible additions that make the site dramatically more useful for AI.
What To Do Next
The AI web is evolving fast. Standards like llms.txt are still emerging, and new patterns will appear. The fundamentals will not change. Structure your content clearly. Make it discoverable. Give machines the metadata they need to understand it.
Pick one of the eight rows in the checklist above. Ship it this week. The next one will be easier than the last.
Read next

Prompt Lab · 9 min read
The Constraint Goes First
Everyone writes prompts top-down: role, context, task, then a stack of rules at the end. That's the bug.
