Why AI numbers vary between scans

Q: Where can I learn about the difference between demo and live numbers?

See [Demo data versus live data](/docs/metrics/demo-live-data) for how the two modes relate, why they differ, and which one to act on.

AI engines are stochastic, and the same question can produce different answers across runs. Here is why your numbers move, and how to read the signal from the noise.

Teams running their second scan often notice the numbers have moved since the first. Mention rate went up a few points. Recommendation rate on one engine dropped slightly. Share of voice on another engine looks different. Before drawing conclusions, it helps to understand the two distinct sources of variation and how AI Native handles each one.

Source one: AI engines are stochastic

Large language models do not produce the same output every time. The same question, asked twice within minutes, can produce answers that name different brands, frame the same brand differently, or reach a different verdict on which option to recommend. This is a property of how the models work, not a bug in the measurement.

The practical consequence is that a single reading of one answer is not reliable. Ask a question once and you get a sample of one. Ask it ten times and you get a distribution. Rates based on a single run would swing wildly between scans for no reason other than sampling noise.

AI Native handles this by running each prompt multiple times and computing rates across the full set of answers. Your mention rate is the fraction of those answers that named you. Your recommendation rate is the fraction that put you forward as a choice. Averaging across runs is what makes those rates stable enough to compare against a previous scan or a competitor.

Even with averaging, a small amount of scan-to-scan variation is expected and normal. A two or three point shift in a rate when nothing changed is almost certainly sampling noise. The trend over multiple scans, and directional moves that persist across several consecutive scans, are the signal you should act on.

Source two: AI engines update their knowledge

AI models are retrained, their retrieval indices are refreshed, and the sources they draw on change over time. A grounded engine's web search pulls current content, so anything published, updated, or moved on the web between two scans can change what the engine says. A parametric model that is retrained absorbs new information that can shift how it represents your brand.

This means that even a perfectly stable measurement methodology will show real variation over time, because the thing being measured genuinely changes. A rising recommendation rate on a grounded engine is likely a real change, either in your content's visibility or in a competitor's. The trend line is where this becomes readable.

What errors are and are not

Some answers in a scan return as errors rather than valid AI responses. A network timeout, a rate limit from the engine, a malformed response: these are acquisition errors. They are never included in rate calculations. An error is not evidence that you are absent from the conversation; it is evidence that the measurement call did not complete. Treating an error as an absent answer would artificially deflate your rates.

The scan summary shows how many answers were acquired cleanly and how many returned errors. A small number of errors is normal. A high error rate in a scan is a measurement quality signal and worth noting before drawing conclusions from that scan's rates.

How to read scan-to-scan variation

The right mental model is that each scan gives you a sample of how AI engines represent your brand at a point in time. The sample has uncertainty, and the platform's underlying data carries confidence intervals based on the number of answers collected. A rate based on thirty answers is more reliable than one based on six.

Practical guidance: treat a single scan as directionally informative. Treat two consecutive scans pointing the same direction as confirmation. Treat three or more consecutive scans as a trend worth acting on. A metric that jumps up in one scan and then reverts is noise. A metric that climbs four scans in a row is real.

The comparison view in the dashboard shows before-and-after for each cell. Read the absolute change and ask whether it is larger than the expected noise range for that cell size.

Branded versus unbranded prompts vary differently

Branded questions name your product directly, so the brand is almost always present in those answers. The variation in branded prompts tends to be in which rung on the ladder you land, not whether you appear at all. Unbranded questions open a wider range of possible answers, so variation in mention and recommendation rates is larger and normal fluctuation is wider.

This is one reason AI Native separates branded and unbranded metrics. Averaging them together would mislead: a stable branded presence combined with a variable unbranded presence would look like mild fluctuation across the board, hiding the fact that one is steady and the other is moving.

Questions

How many times does AI Native run each prompt?

Each prompt is run multiple times per scan, and the rates you see are computed across all of those runs. The number of repeats is visible in the scan configuration before you launch a live scan. More repeats produce more stable rates and cost proportionally more credits. Fewer repeats are faster and cheaper but carry more uncertainty.

Should I be alarmed if my recommendation rate dropped between two scans?

Not necessarily. A single-scan drop in recommendation rate can be noise, especially if the absolute change is small and the scan was based on a modest number of answers. Look at whether the drop persists in the next scan. If it does, and if it is consistent across multiple engines, it is worth investigating the sources driving those answers. If it reverts on its own, it was likely sampling variation.

Do competitor numbers vary in the same way?

Yes. Competitor mention and recommendation rates are measured from the same answers your rates come from, using the same methodology. A competitor's rate fluctuating in a scan is also subject to the same stochastic variance. The share-of-voice metric, because it is relative, smooths out some engine-level noise: if all brands move similarly in a stochastic direction, the ratios between them stay roughly stable.

What is a confidence interval and where does it appear?

A confidence interval is a range that characterises how precisely a rate was measured given the sample size. AI Native computes Wilson confidence intervals for mention and recommendation rates. They appear in the detailed scan view alongside the headline rates. A wide interval means the rate is based on few answers and could shift significantly with more data. A narrow interval means the sample was large enough that the rate is well-estimated.

Can I reduce variation by running more repeats?

Yes, directly. Doubling the number of runs per prompt roughly halves the uncertainty in the rate estimates. If a cell is showing high variation between scans, the first thing to check is whether the measured prompt set is large enough and the repeat count is high enough to produce stable estimates. The trade-off is cost: more repeats consume more credits.

Where can I learn about the difference between demo and live numbers?

See Demo data versus live data for how the two modes relate, why they differ, and which one to act on.