Glossary
robots.txt
The robots.txt file is a web standard (RFC 9309) placed at the root of a site that tells crawler user-agents (search engines, AI engines, scrapers) which paths are allowed or disallowed for indexing.
Also known as
- robots.txt
- robots file
The modern 2026 AEO pattern adds, on top of the generic rules (`User-Agent: *`), **explicit per-AI-bot rules**: GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Applebot-Extended, etc. This proactive stance counterbalances Cloudflare "AI Crawl Control" configurations that block all AI bots by default — the opposite of what AEO requires.
A well-structured robots.txt for a 2026 B2B SaaS looks like: 1. `User-Agent: *` with generic Allow / Disallow rules (allow marketing pages, block dashboard / api / login) 2. An `Allow: /` rule PER AI bot to neutralize any upstream Cloudflare block 3. The `Sitemap: https://...` pointer
Important: robots.txt carries no binding force (a malicious bot can ignore it) — it is a convention. To truly block a bot, you need server-side filtering.
In the getchatsocial.com product
getchatsocial.com's robots.txt follows this pattern: generic rules plus explicit Allow directives for 19 AI bots (GPTBot, ClaudeBot, PerplexityBot, etc.). Inspectable at https://getchatsocial.com/robots.txt.
FAQ
Should I block AI crawlers in robots.txt?
For an AEO strategy: no — the opposite. You want to maximize AI crawler visits to increase the probability of being cited. Blocking GPTBot and ClaudeBot closes the door to ChatGPT and Claude.
What is the difference between robots.txt and llms.txt?
robots.txt is an indexing convention (who can crawl what). llms.txt is a brand identity card addressed to AI engines so they understand who you are and how to cite you. The two are complementary.