Not all data is created equal - why data quality matters more in the age of AI

In 1942, during the chaos of the Pacific War, the Battle of Midway turned on a piece of information. U.S. Navy cryptographers, having cracked the Japanese code, discovered plans for an imminent attack on Midway Atoll. Admiral Chester W. Nimitz had hours, not weeks, to act.

Japanese intelligence, working from flawed data, believed U.S. carriers remained docked at Pearl Harbor. They weren't. The result was a devastating ambush that sank four Japanese carriers and tilted the war in the Pacific. Different data, same battlefield. Wildly different outcome.

It's a useful story to keep in mind right now, because retail is in its own Midway moment - with AI as the new front. The signal you make decisions on isn't getting better automatically just because the model on top of it got bigger. The opposite is happening. Bad data, run through a more confident model, produces more confident bad decisions, faster.

The competitive edge of quality information

The phrase "data is the new oil" has done more harm than good, because it equates value with volume. Crude oil is fungible. Data isn't. A million bad transaction records is worse than ten thousand clean ones - not because the volume hurts you, but because the noise makes every downstream conclusion suspect.

The retailers and suppliers that are pulling ahead in this cycle are the ones that have stopped asking "how do we collect more" and started asking "what do we actually trust, and where does that trust come from". That's the competitive edge of quality information. It compounds.

What "good" actually means

Good information is tailored to the user's question, not the sender's convenience. That's the shorthand. The longer answer is a set of eleven properties that data has to clear before it's worth running a model - let alone a decision - on top of it:

Accurate. Correct, and a true reflection of reality at the moment it was captured.
Complete. All the pieces are present. A partial picture is, in practice, the wrong picture.
Consistent. The same fact, drawn from different sources, agrees.
Timely. Current enough for the decision being made. Yesterday's data is fine for the annual report; not for the promo-effectiveness call you're making this morning.
Valid. Right format, right units, right schema. Field-level validation, not vibes.
Reliable. The same query, run twice, returns the same answer.
Unique. Duplicates are silently devastating. De-dup at the source, not in the dashboard.
Accessible. The right people can reach it, in the right tool, with the right permissions.
Interoperable. It moves between systems without re-keying, re-mapping, or quiet truncation.
Credible. The source is named and trustworthy. Provenance is a feature, not a footnote.
Contextually relevant. Right time, right place, right question. Volume is irrelevant if it's the wrong volume.

A million bad records is worse than ten thousand clean ones. The noise makes every downstream conclusion suspect.

A historical aside - the Semmelweis case

In the mid-19th century, the Hungarian physician Ignaz Semmelweis faced a perplexing problem: a mortality rate from childbed fever of 18% in a Vienna General Hospital maternity ward. He observed that doctors moved directly from autopsies to patient examinations without washing their hands.

His intervention was a chlorine-solution handwashing policy. The mortality rate fell to under 2%. The data was unambiguous. His peers refused to believe him for two decades.

The lesson isn't that the data was wrong - it was, in fact, the cleanest data anyone in 19th-century medicine had access to. The lesson is that good data on its own does not change minds. The system around it - governance, peer review, provenance, replication - is what turns a true number into an acted-on number.

Retail organisations are working through their own version of this right now. The retailers that move first aren't the ones with the most data. They're the ones whose data their own operators trust enough to act on it within hours, not weeks.

Conclusion

From wartime intelligence to germ theory, history teaches the same lesson: the quality of information determines the difference between success and failure. In our data-driven world, leveraging the nuances of high-quality information is a structural advantage that compounds.

AI doesn't change that. AI amplifies it. A model trained on or reasoning over high-quality, governed, verified data is a competent analyst. A model on top of poor data is a confident hallucinator. The two are not on the same product roadmap. They aren't even in the same industry.

The technology question of the next decade isn't "which model do we buy". It's "which data do we own, what do we know about it, and what are we willing to act on". The retailers that answer that question well will quietly dominate the ones that don't.

Not all data is created equal.

The competitive edge of quality information

What "good" actually means

A historical aside - the Semmelweis case

Conclusion

More from tapestry.

Real-time data sharing - enhancing shopper experience through retailer-supplier collaboration.

Why it's better to own your data.

Harnessing the power of now - the benefits of real-time retail analytics.