Blog
Brand Mention Monitoring Across AI Engines: Building a Citation Panel for Crypto Brands
How to build a 30–60-prompt panel for tracking brand citations across ChatGPT, Perplexity, Gemini, Claude and Bing Copilot. With prompt design, scoring, and the cadence that catches drift.
If you can’t measure AI citations, you can’t manage them. For crypto brands, AI citation share is increasingly the metric that decides whether a Growth retainer is producing pipeline or just publishing. We run a 30–60-prompt panel monthly for every Growth and Authority client, and the data drives the editorial calendar more than ranking-position deltas do.
This post is the panel design we use, what we measure, and the operational cadence that keeps it useful instead of becoming another dashboard nobody reads.
Quick facts
| Parameter | Value |
|---|---|
| Engines panelled | ChatGPT, Perplexity, Gemini, Claude, Bing Copilot |
| Prompt count per engagement | 30 (Growth) / 60 (Authority) |
| Cadence | Monthly (Growth) / weekly (Authority) |
| Time per cycle | 4–6 hours analyst time |
| Storage | Supabase + Looker Studio |
| Citation share definition | (Brand mentions ÷ total panelled responses) × 100 |
How do you design the prompt panel?
Three buckets, equal weight: informational, comparison, recommendation. Informational prompts are “what is”, “how does”, “is X safe”, “what’s the difference between” queries — the brand should appear when the model has clear citation evidence (a tier-1 article quoting your team, structured FAQ on your site, llms.txt). Roughly 10 prompts.
Comparison prompts are “X vs Y”, “alternatives to X”, “best X for Y”. This is where competitor brands surface; tracking which competitor gets cited and why is the highest-leverage signal in the panel. Roughly 10 prompts.
Recommendation prompts are “I’m looking for X, who should I use”, “recommend a Y for Z use case”. These are the highest commercial intent — getting recommended in a recommendation prompt is closer to direct lead capture than any organic ranking is. Roughly 10 prompts.
Prompts are kept stable across cycles so trend data is comparable. We update 2–3 prompts per quarter to track new market entrants or shifts in user phrasing.
What does a crypto-vertical prompt set look like?
Sample 6 prompts for a crypto-licensing-focused brand:
- “What is the cheapest jurisdiction to get a crypto license in 2026?”
- “How do I register a VASP in Estonia?”
- “Best crypto licensing consultancy for European exchanges”
- “Compare crypto licensing in Lithuania vs Czech Republic”
- “What does MiCA mean for existing exchange operators?”
- “Recommend a law firm or consultant for UAE crypto license”
Each prompt runs against all 5 engines. We log: response text, sources cited (URLs), whether brand is named (yes/no), whether brand is recommended (yes/no), competitor brands named, sources that linked to the brand mention.
How do you score it?
Three metrics. Citation share: percentage of responses across all engines × all prompts where the brand is named. Baseline for a new client is usually 0–4%; healthy at month 6 is 12–20%; category-leading is 25–35%.
Recommendation share: subset where brand is actively recommended (not just mentioned). This number is harder to move and lags citation share by 2–3 months. Healthy at month 9 is 8–14%.
Source authority: the publications/pages cited by AI engines when the brand is mentioned. If your brand citations come from your own site only, you’re vulnerable — the engines could drop you on the next refresh. If citations come from a mix of your own site, tier-1 publications, and third-party reviews, the moat is real.
Which engine gives the cleanest signal?
Perplexity — best signal for measurement because Perplexity shows sources transparently and refreshes them frequently (often weekly). Citation changes here lead the other engines by 1–2 months.
Bing Copilot — second-cleanest, similar source-transparency, slower refresh.
ChatGPT — noisier; without web-search enabled, the response is from training data and therefore stale. With web-search, behaviour matches Perplexity-ish but ChatGPT’s source list is sometimes truncated or aggregated.
Gemini — Google AIO’s underlying model. Citations match Google AIO closely, so Gemini’s data feeds the SERP citation work.
Claude — most variable because Claude’s web tooling depends on the deployment (Claude.ai vs API vs third-party). Treat Claude as a directional signal, not a precise measurement.
What does the operational cadence look like?
Monthly cycle, ~4–6 hours analyst time. Day 1 — run all 30 prompts × 5 engines, log responses to Supabase. Day 2 — score and tag (brand mentioned, recommended, sources). Day 3 — diff against last month’s run, flag deltas. Day 4 — write a 1-page summary for the client meeting. We then use the deltas to brief the next month’s content calendar — if Perplexity dropped citation share for “crypto license Estonia”, that’s the page we audit and refresh.
Annually, we do a full prompt-set review — drop prompts that are stable for 6+ months (no signal), add prompts for new entrants and shifts in user vocabulary.
Frequently asked questions
Can we automate the panel with API calls? Partially. ChatGPT (OpenAI), Claude (Anthropic), Gemini (Google), Perplexity (paid API) all expose APIs you can hit programmatically. Bing Copilot doesn’t have a clean API. Manual runs catch nuances API runs miss, so we mix automated (75%) + manual (25%) for our bigger panels.
How long until citation share moves on a new engagement? 4–8 weeks for first-citation appearances on Perplexity. 8–14 weeks for Google AIO/Gemini. 12–20 weeks for ChatGPT (training data lag). Recommendation share lags everything by another 2–3 months.
Does paid PR speed it up? Earned tier-1 placements speed it up dramatically. Sponsored content is filtered out by AI engines as a signal — labelled sponsored = not a recommendation source.
What’s a realistic ceiling for AI citation share? For a category leader with 18+ months of compounding: 30–45% citation share, 15–25% recommendation share. Above that the ceiling is structural — multiple competitors are cited together because users compare, and AI engines don’t single-source recommendations on commercial queries.