ESSAY № 01

How to Test What ChatGPT Actually Knows About Your Store (in 60 Seconds)

9 min read · Essay 1 of 4

Open ChatGPT. Type: "What do you know about Homestead & Co.?" Read the answer. If it's wrong, thin, or missing — that's the gap you're actually fighting.

Most AI-SEO conversations skip the first step: finding out what AI currently knows about your brand. You can tune every schema attribute perfectly, rewrite every title into Merchant Center format, publish a pristine llms.txt — and still convert nothing, because AI engines don't know you exist or know you for the wrong things. The test takes 60 seconds and it reorders every other priority on your list.

The thesis: test before you optimize

The lowest-effort, highest-insight AI-visibility test is asking AI engines directly what they know about your brand. You should do it before any tactical fix, and you should redo it after every meaningful change.

Why this ordering matters: tactical work is cheap individually and expensive in aggregate. An afternoon on titles, another afternoon on schema, another on WAF config — it adds up. If the AI engine has never heard of you, none of those fixes move the needle in a way a shopper experiences. The first question isn't "is my schema right." It's "does the model currently output my brand when a real buyer asks a real question." Those are different problems with different fixes.

The test also gives you a baseline. You can't tell whether a change worked if you never captured what the answer looked like before. Most merchants never capture that baseline, so their optimization work feels like shouting into a void — no before, no after, no signal. Five minutes of prompting fixes that.

Think of it the way you'd think of any other audit: look at the output first, then reason backward to what's causing it. AI answers are the output. Your store is the cause. Don't reverse the order.

The 60-second test

Four engines cover the surface that matters for shopping and brand queries. Open each one in a fresh tab with no prior conversation history — you want the model's baseline knowledge, not a thread it's been primed in. Replace the bracketed placeholder with your brand name.

ChatGPT:    "What do you know about [YourBrand]?"
Claude:     "Tell me what you know about [YourBrand] as a company."
Perplexity: "[YourBrand] reviews, products, and reputation"
Gemini:     "Overview of [YourBrand] — what they sell and who buys from them."

The phrasings differ on purpose. ChatGPT's default web tool likes direct, conversational questions. Claude's training-data recall responds better when you frame it as a company profile. Perplexity is a search-grounded engine — a keyword-style query gets you closer to what a shopper comparing options would see. Gemini's shopping surfaces weight audience and assortment language, so asking about what you sell and who buys from you matches how it's indexed.

Why these four? ChatGPT, Claude, and Gemini are the three engines StoreAudit's automated probe covers (more on that below). Perplexity is on the manual list because it's the easiest engine to visually see how AI sources information — useful for the fan-out diagnosis covered in the AI query fan-out explained.

Run each prompt. Copy the answers into a doc. Do it in that order, once a month, against the same brand phrasing you use everywhere else (exact capitalization, exact spacing — AI matching is less forgiving than Google's).

A few test-design notes. Use a signed-out or incognito session where you can — ChatGPT's memory and Gemini's account-aware context will both bias the answer. Test the exact brand string your customers use: if the site and packaging say Homestead & Co., test that, not your LLC name. And run a second round with a category query ("best [your category] brands") — not surfacing there is a different, usually more valuable, gap than not surfacing for your own name. Don't prompt-engineer a good answer out of the AI; the whole point is what a real buyer would see.

Interpreting the answer — the three verdicts

Every answer lands in one of three buckets. Labeling it this way forces you to act on what you saw instead of reading it ambiguously.

KNOWN — The AI describes you accurately. It names product categories or specific products correctly, gets your positioning roughly right, and doesn't fabricate. This is where you want to be. Your work from here is maintenance and breadth — showing up for more category queries, not just branded ones. Watch out for the WRONG sub-case: a confident answer with fabricated facts (wrong founding year, wrong hero product, confusion with another brand) is actively worse than UNKNOWN, because shoppers can't tell it's wrong. When you grade KNOWN answers, check every concrete claim — fixing a wrong answer requires crowding the bad signals out with strong correct ones (schema, reviews, disambiguating content with your exact brand string).
SPARSE — The AI knows you exist but gives a thin, generic answer. "Homestead & Co. appears to be a home goods brand" with no specifics, no products named, no differentiator. SPARSE means you're indexed but under-described. Usually a signal that on-site structured data is weak or off-site mentions (reviews, press, directories) are shallow. Fixable with a month or two of targeted work.
UNKNOWN — "I don't have information about [YourBrand]." This is the honest failure mode. The AI admits the gap. Usually means either you're too new, you're too small for the model's training sources, or (most commonly) a WAF is blocking the crawlers.

One nuance: don't conflate SPARSE and UNKNOWN. They feel similar in the tab but they're different problems. SPARSE means the model found you and didn't have much to say. UNKNOWN means the model didn't find you at all. The fix for the first is content density; the fix for the second is reachability (usually crawlability — see the llms.txt setup guide and run the WAF check).

What to look for specifically

When you read each answer, grade it on four specific things. The overall impression lies to you; the specifics don't.

Category accuracy. Does the AI place you in the right category? A protein powder brand described as "a supplement company" is softer than "whey isolate protein powder." A furniture brand described as "home decor" is softer than "solid wood furniture." Generic category language means the model has your vibe but not your specifics.

Price and positioning. If the AI quotes a price band, check it. A premium brand described as budget-friendly is a positioning leak — usually from a review site or marketplace listing the model trusts more than your own pages. Fixing this means tightening the signals on your own domain so they outweigh the noisy external ones.

Policies and social proof. Does the AI know your return policy, your shipping terms, the fact that your reviews average 4.7? These are the trust signals shoppers ask AI about before they buy. If they're missing, your policy pages and review widgets aren't visible to the crawlers.

Correctness of specifics. Every concrete claim the AI makes — a product name, a founding year, a location, a hero product — check it. Wrong specifics are the most damaging output because they look confident. This is where WRONG shades into KNOWN and you have to catch it.

Why this differs from SEO rank tracking

Rank tracking measures a position in an ordered list. You're #3 for "whey isolate vanilla," you were #4 last week, you move up or down in a fixed vocabulary of ten results. The surface is stable; only your place on it changes.

AI doesn't rank. It synthesizes. Ask the same question twice and you get two different paragraphs — different phrasing, different facts surfaced, sometimes different brands named. There is no list to be #3 in. There is a generated answer, and you're either in it, adjacent to it, or absent from it.

That's why the test looks different from a rank-tracking report. You're not measuring a position; you're measuring a description. The question isn't "where do we rank" — it's "when a buyer asks, what does the AI actually say, and is it accurate enough to drive a click or a purchase?"

Practically: rerun the test once a month, not once a day. AI training cycles and index refreshes don't move that fast. Save every answer verbatim. The interesting signal is the shape of the movement over a quarter — SPARSE → KNOWN is the transition you're looking for. If you see it, whatever you were doing in the prior 90 days is working. If you don't, it isn't. For the underlying mechanics of why a single question fans out into many answers, see the AI query fan-out explained.

What this means for Shopify merchants specifically

Most Shopify stores we test land in SPARSE for branded queries and UNKNOWN for category queries. The jump from SPARSE to KNOWN on Shopify is usually driven by a short list of signals, and the jump from UNKNOWN to SPARSE is almost always a reachability fix.

If you're SPARSE, the Shopify-specific levers that move you forward are, in rough order of impact: product titles in Merchant Center format, structured data on product pages (JSON-LD with Product, Offer, AggregateRating), a clean llms.txt pointing to the collections you want surfaced, visible reviews rendered in HTML (not lazy-loaded JS), and a store policies page the AI can actually read. None of those are exotic — they're the same fundamentals covered in the tactical guides in this series. What changes is the order you do them in: test first, then pick the lever that matches the gap you saw. Off-site mentions are the multiplier on top of all of this, which is its own conversation in off-site signals for AI recommendations.

If you're UNKNOWN, don't start with content. Start with reachability — WAF, robots, llms.txt. Polishing a page the crawlers can't read is a category error.

StoreAudit runs an automated version of this test as part of every free scan — not a Full AI Audit upsell, not gated behind payment. It asks Claude, ChatGPT, and Gemini in parallel ("does this brand exist? what do you know about it?"), classifies each answer as KNOWN, SPARSE, or UNKNOWN, and surfaces what each engine actually said. You should still do the manual 60-second test because it's free and instant — but the automated version captures the answer verbatim every time you scan, so you can compare the next scan against the last one.

Run the test now, save the four answers, and pick the single gap that surprised you most. That's where the next month of work goes. Run a free audit on your store →