AEO & Visibility·April 14, 2026·6 min read

Optimizing Your Site for AI Agents and LLMs

Your site has human visitors and AI visitors. Here is how to serve both, with llms.txt, inline LLM instructions, structured data, and machine-readable feeds.

Agnel NievesApril 14, 2026

Originally published at agnelnieves.com.

Read in Am Puck's voice, synthesized.AI voice

View as Markdown

Engraved illustration of an open book on a stone pedestal with golden keys rising from its pages toward a glowing terminal prompt symbol, framed by botanicals and an hourglass against a deep blue background. — Hero illustration and animation generated with Grok.

Your site has two audiences now. Humans, obviously. But also AI agents, the LLMs that crawl, summarize, cite, and recommend content to millions of people. If your site is not optimized for both, you are leaving visibility on the table.

I just finished optimizing my own site for AI consumption, and the process surfaced something worth naming up front: most of what makes a site good for AI also makes it better for humans. Clear structure, machine-readable content, and explicit metadata benefit everyone.

Here is what I did, in the order I would do it again, and why each piece matters.

What AI Agents Are Actually Doing With Your Site

When someone asks ChatGPT, Claude, Perplexity, or Google's AI Overview a question, those systems do not just generate answers from training data. Increasingly, they fetch and cite live web content. Your site might get:

Crawled for training data by bots like GPTBot, ClaudeBot, and Google-Extended
Fetched at query time by Perplexity, ChatGPT browsing, and similar agents
Cited as a source in AI-generated responses
Summarized in featured snippets and AI overviews
Navigated by autonomous agents that interact with your APIs

Each of those has different needs. They all benefit from the same foundation: structured, discoverable, machine-readable content.

The llms.txt Standard

The llms.txt spec is the equivalent of robots.txt for AI agents. Where robots.txt tells crawlers what they can access, llms.txt tells them what your site is. It is a structured markdown index, served at your domain root, written for a reader that is happier with markdown than with HTML.

The format is simple:

# Your Name or Site

> A one-line summary of what this site is.

A longer description paragraph.

## Section Name

- [Link Title](https://url): Description of what is at this link

I implemented two variants:

/llms.txt is the index. A table of contents with links to all pages, blog posts, profiles, and feeds. Think of it as a menu for AI agents to browse selectively.
/llms-full.txt is the full dump. Every blog post's complete markdown, every project description, biographical context. For agents that want everything in context at once.

Both are served as text/plain with markdown formatting. Both are generated dynamically from the same data sources that power the site, so they never go stale.

Inline LLM Instructions in HTML

This one comes from a Vercel proposal and it is clever: embed AI-readable instructions directly in the page <head> using a script tag that browsers ignore.

<script type="text/llms.txt">
# Your Site Name

This is the personal site of [name], a [role] based in [location].

## Site Structure
- /: Home, brief description
- /blog: Blog, brief description
- /about: About, brief description

## Key Facts
- Name: Your Name
- Role: Your Role
- Specialties: Thing 1, Thing 2, Thing 3
</script>

Browsers skip <script> tags with unknown types. LLMs process them. It is a zero-cost way to give every page on your site a machine-readable context block. I added one to my root layout that describes who I am, the site structure, and where to find the machine-readable content.

Structured Data That AI Engines Actually Use

JSON-LD structured data has always been important for Google. It is now equally important for AI engines. When an LLM encounters schema.org markup, it understands the semantics of your content. Not just the text, but what the text represents.

I already had structured data for my blog posts (BlogPosting schema with breadcrumbs). What I added was CreativeWork schema for my project pages, giving each project a machine-readable identity:

{
  "@context": "https://schema.org",
  "@type": "CreativeWork",
  "name": "Project Name",
  "description": "What this project is",
  "url": "https://project-url.com",
  "creator": {
    "@type": "Person",
    "name": "Your Name"
  }
}

The more schema types you cover, the more AI engines can understand and cite your work with proper attribution.

Machine-Readable Feeds

RSS is great, but it is XML, which is not the most natural format for AI agents to parse. I added a JSON Feed endpoint alongside my existing RSS feed:

/feed.xml: RSS 2.0 for traditional feed readers
/feed.json: JSON Feed 1.1 for programmatic consumption

JSON Feed is cleaner for AI agents to parse and reference. Both are registered in the site's metadata so they are auto-discoverable.

Making robots.txt AI-Aware

Most sites already have a robots.txt. The key addition is explicitly allowing AI crawlers and pointing them to your llms.txt:

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

# AI/LLM Content
# llms.txt: https://yoursite.com/llms.txt
# llms-full.txt: https://yoursite.com/llms-full.txt

Many sites block AI crawlers by default. If you want your content cited and discovered by AI, explicitly allow the major bots: GPTBot, ChatGPT-User, Google-Extended, ClaudeBot, anthropic-ai, PerplexityBot, Applebot-Extended, Bytespider, and cohere-ai.

Why This Matters

I have spent the last few years watching organic SEO traffic dry up while consulting with marketing teams who keep seeing the same dashboards stay green for one more quarter. SEO has evolved from keyword stuffing to semantic web to AI-native citation. We are at an inflection point. The sites that get cited by AI are not necessarily the ones with the best domain authority. They are the ones with the clearest, most structured, most machine-readable content.

This matters for small publications and personal sites in particular. When someone asks an AI "what is a good article on AEO?" or "who writes useful stuff about prompts?", you want your site to be in the answer. That requires more than good content. It requires content that AI can find, understand, and attribute.

The Full Stack of AI Optimization

Here is the complete checklist of what I now have in place:

Layer	What	Why
`robots.txt`	Explicitly allow AI bots	Let them crawl
`sitemap.xml`	Dynamic sitemap with all content	Let them discover
`llms.txt`	Markdown index of the site	Let them understand structure
`llms-full.txt`	Full content in one file	Let them ingest everything
Inline `<script>`	Page-level LLM instructions	Let them understand context
JSON-LD	Structured data on every page	Let them understand semantics
RSS + JSON Feed	Machine-readable content feeds	Let them subscribe
Meta tags	OpenGraph, Twitter, canonical	Let them cite accurately

None of these changes affect how the site looks or feels for human visitors. They are invisible additions that make the site dramatically more useful for AI.

What To Do Next

The AI web is evolving fast. Standards like llms.txt are still emerging, and new patterns will appear. The fundamentals will not change. Structure your content clearly. Make it discoverable. Give machines the metadata they need to understand it.

Pick one of the eight rows in the checklist above. Ship it this week. The next one will be easier than the last.