Glossary

robots.txt

The robots.txt file is a web standard (RFC 9309) placed at the root of a site that tells crawler user-agents (search engines, AI engines, scrapers) which paths are allowed or disallowed for indexing.

Also known as

  • robots.txt
  • robots file

The modern 2026 AEO pattern adds, on top of the generic rules (`User-Agent: *`), **explicit per-AI-bot rules**: GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Applebot-Extended, etc. This proactive stance counterbalances Cloudflare "AI Crawl Control" configurations that block all AI bots by default — the opposite of what AEO requires.

A well-structured robots.txt for a 2026 B2B SaaS looks like: 1. `User-Agent: *` with generic Allow / Disallow rules (allow marketing pages, block dashboard / api / login) 2. An `Allow: /` rule PER AI bot to neutralize any upstream Cloudflare block 3. The `Sitemap: https://...` pointer

Important: robots.txt carries no binding force (a malicious bot can ignore it) — it is a convention. To truly block a bot, you need server-side filtering.

In the getchatsocial.com product

getchatsocial.com's robots.txt follows this pattern: generic rules plus explicit Allow directives for 19 AI bots (GPTBot, ClaudeBot, PerplexityBot, etc.). Inspectable at https://getchatsocial.com/robots.txt.

FAQ

  • Should I block AI crawlers in robots.txt?

    For an AEO strategy: no — the opposite. You want to maximize AI crawler visits to increase the probability of being cited. Blocking GPTBot and ClaudeBot closes the door to ChatGPT and Claude.

  • What is the difference between robots.txt and llms.txt?

    robots.txt is an indexing convention (who can crawl what). llms.txt is a brand identity card addressed to AI engines so they understand who you are and how to cite you. The two are complementary.