GUIDE № 13 · CRAWLABILITY

How to Check If Cloudflare Is Secretly Blocking ChatGPT from Your Shopify Store

9 min read · Guide 13 of 17

You checked your robots.txt. It allows everything. But ChatGPT still can't read your store.

The culprit is one layer above robots.txt — your WAF (probably Cloudflare Bot Fight Mode) quietly returning 403 to any request with User-Agent: GPTBot. Most merchants never discover this until their AI visibility flatlines. This guide walks through the WAF layer, the six AI user agents to test, a 30-second curl check, how to read the response (including the sneaky challenge pages that look like 200s), and how to clear the block in the Cloudflare dashboard.

The WAF layer above robots.txt

robots.txt is a politeness contract. A crawler reads it and voluntarily decides what to fetch. A WAF — web application firewall — is the opposite. It intercepts the request before your store ever sees it and decides whether to forward, block, or challenge based on its own rules. The crawler has no say.

Cloudflare sits in front of a huge share of Shopify-on-custom-domain stores, either because the merchant put it there deliberately or because an installed app routed DNS through it. Once Cloudflare is in the request path, three separate layers can challenge or block AI crawlers — and merchants frequently mistake one for another:

Bot Fight Mode. A heuristic, free-tier bot defense. It's noisy on scrapers and aggressive automation, but it's not the layer most often responsible for blocking GPTBot and ClaudeBot specifically.
Block AI Bots / AI Crawl Control. A dedicated AI-targeting managed rule that Cloudflare ships separately from Bot Fight Mode. It's often enabled by default on newer zones. This is usually the layer that blocks ChatGPT, Claude, and Perplexity even when Bot Fight Mode is off.
Super Bot Fight Mode. The paid, more aggressive cousin of Bot Fight Mode. Available on Pro plans and above; broader category controls.

Anything any of these layers flags as a non-search-engine bot gets a 403 or a JS challenge — regardless of what your robots.txt says on the origin.

This is why you can do everything right at the Shopify layer — clean robots.txt.liquid, no app-injected rules, product pages wide open — and still have zero AI crawl coverage. The block is happening before the request ever reaches Shopify. For the Shopify-layer check, start with the robots.txt guide; this guide assumes that one is already clean.

The 6 AI user agents to check

You don't need to test twenty bots. Six user agents cover most of the AI shopping and answer-engine surface, and they're the same set StoreAudit's WAF probe tests against:

GPTBot — OpenAI's crawler. Feeds ChatGPT's web tool and training set.
PerplexityBot — Perplexity's crawler. Feeds Perplexity answers and shopping cards.
ClaudeBot — Anthropic's crawler. Feeds Claude's web tool and citations.
Google-Extended — Google's AI training and AI Overviews crawler. Separate from Googlebot.
Applebot-Extended — Apple's AI training crawler. Powers Apple Intelligence and Siri's web answers.
meta-externalagent — Meta's external agent crawler. Feeds Meta AI surfaces.

A healthy Shopify store returns a 200 on all six. If Googlebot also gets blocked, the WAF is misconfigured against search too — that's a different, more urgent problem, but the fix path is the same dashboard. If only the AI agents fail while Googlebot succeeds, you're looking at an AI-specific block, which is the most common pattern we see.

How to check in 30 seconds

One terminal, one command per agent. Run these against your live store:

curl -I -H "User-Agent: GPTBot" https://yourstore.com/
curl -I -H "User-Agent: PerplexityBot" https://yourstore.com/
curl -I -H "User-Agent: ClaudeBot" https://yourstore.com/
curl -I -H "User-Agent: Google-Extended" https://yourstore.com/
curl -I -H "User-Agent: Applebot-Extended" https://yourstore.com/
curl -I -H "User-Agent: meta-externalagent" https://yourstore.com/

The -I flag fetches headers only, which is all you need to see the verdict. Run each one, scan the first line and the cf-mitigated header if present, and move on. The whole check takes less than a minute.

If you want to be thorough, repeat against a product URL and a collection URL — some WAF rules block the homepage differently from deep pages, especially if the rule is keyed on path. A single homepage check is usually enough to find the problem, but a product-page confirmation rules out false negatives.

Reading the response (including sneaky challenge pages)

There are three outcomes worth recognizing. The first two are obvious; the third is the one that fools merchants.

200 OK, real HTML. The bot got through. You're fine on that user agent.

HTTP/2 200
content-type: text/html; charset=utf-8
server: cloudflare

403 Forbidden. Flat block. Cloudflare (or whichever WAF is in front) has decided this user agent is not welcome. AI crawlers treat this the same way the browser does — nothing to index, full stop.

HTTP/2 403
content-type: text/html; charset=UTF-8
server: cloudflare
cf-mitigated: block

200 OK with a challenge page. This is the one that lies to you. The status code is 200, so a quick glance says "fine." But the body isn't your store — it's a JavaScript challenge page that a crawler can't execute. The telltale header is cf-mitigated: challenge (sometimes managed_challenge). If you see that, the bot got the equivalent of a 403, just dressed up differently.

HTTP/2 200
content-type: text/html; charset=UTF-8
server: cloudflare
cf-mitigated: challenge

Other markers that you're looking at a challenge instead of your page: a cf-chl- cookie in the set-cookie header, a body that's a few KB of obfuscated JS, or the string Just a moment… in the response if you drop the -I and fetch the body. Any of those means the AI crawler is getting nothing usable, even though the status code looks clean.

To see the actual impact on AI answers — not just the HTTP handshake — pair this check with the test what ChatGPT knows about your store walkthrough. A WAF block shows up there as ChatGPT returning generic category answers instead of recognizing your products by name.

How to fix Cloudflare AI bot blocking

The fix is a dashboard change, not a code deploy. Walk the click path in this order — disable the AI-specific rule first, then add the user-agent skip rule:

Log in to the Cloudflare dashboard and pick the zone (domain) for your store
Go to Security in the left nav
Open Bots
Find Block AI Bots (sometimes labeled AI Crawl Control) and turn it off, or set it to Allow for the agents listed above. This is the layer most likely to be silently blocking ChatGPT, Claude, and Perplexity.
Then, with that rule disabled, click Configure next to Bot Fight Mode (or Super Bot Fight Mode) and either toggle it off entirely, or — better — add a WAF custom rule that Skips bot protection for the AI user agents

The skip-rule approach is what most stores want. You keep bot protection against actual abuse and only carve out an exception for legitimate AI crawlers. In Security > WAF > Custom rules, create a new rule with an expression like:

(http.user_agent contains "GPTBot") or
(http.user_agent contains "PerplexityBot") or
(http.user_agent contains "ClaudeBot") or
(http.user_agent contains "Google-Extended") or
(http.user_agent contains "Applebot-Extended") or
(http.user_agent contains "meta-externalagent")

Set the action to Skip and tick the boxes for All remaining custom rules, Super Bot Fight Mode, and Managed Rules as appropriate. Save and deploy. Changes are live within a minute or two across Cloudflare's edge — there's no cache to purge for this.

If you're on a Cloudflare plan that includes Verified Bots, check whether the six AI agents you care about are in the verified list for your zone before you add custom rules. Cloudflare has been expanding AI-bot verification, and any agent that's verified can be allowed with a single checkbox instead of a custom rule. The list changes, so confirm in your dashboard rather than trusting a snapshot from a blog post.

Edge cases: AWS WAF, Sucuri, flaky 429s

Not every block is Cloudflare. A few adjacent situations worth recognizing:

AWS WAF Bot Control. If your store is behind CloudFront with AWS WAF, the same pattern applies: a managed rule group called AWSManagedRulesBotControlRuleSet evaluates user agents and can block AI bots by default on the "Targeted" tier. The fix lives in the WAF console under Web ACLs > [your ACL] > Rules, where you override the specific bot-category actions or add a label-based allow rule for the AI agents. The curl check is identical; the response headers just won't mention Cloudflare.

Sucuri and other third-party WAFs. Sucuri's firewall sits in front of plenty of Shopify-on-custom-domain stores, often bundled with a malware-scanning product. Its default bot policy is less aggressive than Cloudflare's, but it does block on reputation. The fix is in the Sucuri dashboard under Firewall > Access Control > Whitelist, where you can add user-agent exceptions. Same idea, different UI.

Flaky 429s and CDN rate limits. Sometimes the block isn't a block — it's a rate limit that fires when a crawler hits too many URLs in a burst. You'll see 429 Too Many Requests instead of 403, and the response will clear up a minute later. If a single curl returns 200 but large crawl jobs fail, suspect a rate limit rather than a WAF rule, and look in Security > WAF > Rate limiting rules for anything keyed on request count per IP.

Legitimate bot blocking. If your WAF is blocking Googlebot alongside the AI bots, the rule isn't AI-specific — it's a broad "block anything that isn't a verified browser" policy, and it's costing you organic search as well as AI. That's a wider conversation than this guide, but the dashboard path is the same: start at Security > Bots and work outward.

Verify the fix worked

Re-run the six curl commands from earlier. All six should now return HTTP/2 200 with a real HTML body and no cf-mitigated header. If you see cf-mitigated: challenge anywhere, the skip rule didn't cover the challenge layer — go back and tick Super Bot Fight Mode on the skip action, or downgrade the bot action from Managed Challenge to Allow on the specific user agents.

For a more complete picture, follow up with the llms.txt setup guide — once your store is reachable, llms.txt is the file AI agents read first to decide what on your store is worth crawling. A reachable store with no llms.txt still indexes, but less efficiently.

Don't expect AI assistants to rediscover you overnight. GPTBot and ClaudeBot re-crawl on their own schedule; crawl latency varies by source, so expect propagation in weeks rather than days, with most movement in the first month after the fix. Stores with inbound links that get crawled regularly tend to come back faster.

WAF blocks are one of the most common reasons an otherwise AI-ready Shopify store gets challenged, alongside missing GTINs and policy gaps. They're also the cheapest fix in this series — minutes in a dashboard, no code, no content rewrite. Run a free audit on your store →