robots.txt for GPTBot, ClaudeBot and PerplexityBot: Block or Allow?

Complete technical guide on configuring robots.txt for AI crawlers. 41% of B2B sites block AI bots without realizing it, losing up to 34% of their potential citations in ChatGPT and Perplexity.

There is a technical mistake that 41% of B2B websites make without knowing it: they block AI crawlers in their robots.txt. The result is invisible but devastating — each blocked bot reduces by 18% to 34% the potential citations that model could make about your brand. If you have a GEO strategy but your robots.txt blocks GPTBot, ClaudeBot or PerplexityBot, you are building on sand.

The AI Crawlers You Need to Know in 2026

Each AI provider has multiple crawlers with distinct functions. Understanding the difference between them is essential for deciding what to allow and what to block:

OpenAI — GPTBot: Training crawler. Extracts content for GPT-4 and future model training datasets. If you block it, your content doesn't enter ChatGPT's training corpus.
OpenAI — OAI-SearchBot / ChatGPT-User: Real-time search crawlers. These are what ChatGPT uses when it searches the web for current answers. Blocking them eliminates your eligibility to be cited in real time.
Anthropic — ClaudeBot: Claude training crawler. If you block it, your content doesn't enter Anthropic's models.
Anthropic — Claude-SearchBot / anthropic-ai: Real-time search crawlers for Claude with web tools.
Perplexity — PerplexityBot: Perplexity's main crawler. It uses it to index content and cite it in real-time responses. Blocking it removes you completely from Perplexity.
Google — Google-Extended: Specific crawler for Gemini and Bard. Independent of Googlebot. Blocking it prevents your content from feeding Google AI Overviews and Gemini.
Common Crawl — CCBot: Not directly from an AI provider, but its datasets are used by many open-source LLMs. Has no real-time search functionality.

The Optimal Stance: "Allow Search, Block Training"

In 2026, the consensus among companies with mature GEO strategies is the "block training, allow search" stance: block pure training crawlers (which use your content to train models without giving you immediate visibility) and allow real-time search crawlers (which do generate citations and direct visibility).

This stance gives you control over how your content is used for training while maximizing your eligibility to be cited in real-time responses from ChatGPT, Perplexity and Claude.

🚀 Digital PR & Link Building

Rank #1 on Google and Get Mentioned by AI

1,200+ media outlets in 8 countries. Backlinks that Google and ChatGPT recognize as trusted authority sources.

See plans and pricing →

Recommended robots.txt Configuration

# Allow real-time search crawlers (generate citations)
User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Claude-SearchBot
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: Google-Extended
Allow: /

# Block pure training crawlers (optional)
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

Important note: If you prefer to maximize impact on future model training (which also generates long-term visibility), you can allow all crawlers. The decision depends on whether you prioritize immediate impact (real-time search) or long-term impact (training).

How to Verify Your Current Configuration

Visit yourdomain.com/robots.txt directly in your browser.
Look for generic rules like User-agent: * followed by Disallow: / — these block ALL bots including AI ones.
Verify that PerplexityBot, GPTBot or Google-Extended are not blocked unintentionally.
Use Google Search Console's robots.txt testing tool to verify which URLs each bot can crawl.

The Cloudflare and CDN Problem

A frequent mistake is that Cloudflare configuration blocks AI bots at the WAF level before they even reach your robots.txt. Cloudflare has a "Bot Fight Mode" feature that blocks unverified bots — and some AI crawlers fall into that category. Check in your Cloudflare dashboard that known AI bots are not being blocked by firewall rules before reaching your content.

Crawlers That Ignore robots.txt

Some crawlers have been documented ignoring robots.txt directives. In these cases, the only real defense is at the server or WAF level. However, for major providers (OpenAI, Anthropic, Google, Perplexity), compliance with robots.txt is generally respected.

Impact on Your GEO Strategy

A correct robots.txt configuration for AI bots is the most basic technical prerequisite of any GEO strategy. There is no point investing in high-quality link building and Digital PR if Perplexity or ChatGPT crawlers cannot access your content to cite it. Audit your robots.txt today before any other GEO action.

Complement this configuration with well-implemented Schema.org and answer-ready content to maximize what AI crawlers find when they do have access.

Esbuenisimo Links includes technical GEO auditing — including robots.txt configuration for AI crawlers — in its consulting services, ensuring no technical block prevents ChatGPT, Perplexity and Google AI Overviews from indexing and citing your content.

robots.txt AIGPTBot blockClaudeBot robots.txtPerplexityBot configurationAI crawlers setup

Ready to rank #1 and get mentioned by AI?

Digital PR & Link Building in 1,200+ media outlets across 8 countries. Google and ChatGPT recognize you as an authority source.

See plans and pricing