When everyone's models are trained on the same public web, the edge is proprietary data. Where to invest - and where it won't pay back.
The technological epoch is witnessing a seismic shift, with Artificial Intelligence driving change. Foundation models like GPT-4 and Claude are not only proving AI's prowess - they are also prompting introspection about market dynamics.
The primary question that arises: Is AI's heavy reliance on public data leading to a race to the bottom? Or, more precisely - is competitive advantage in AI being competed away as every player trains on the same internet?
Historically, market commodification follows a familiar pattern. As an input becomes universally available, the value migrates elsewhere. Compute went through this cycle. Cloud went through it. Data is starting to.
For AI, the input that is rapidly commodifying is the public web. Every major model is now trained on a recognisable corpus - some books, some Common Crawl, some Reddit, some Wikipedia. The differences between models on general knowledge are shrinking. The differences will continue to shrink.
If that is where the contest is, then yes - it is a race to the bottom.
But it is not where the contest moves. The contest moves to proprietary data: the data that only one operator has, that no public model can replicate, and that materially improves the answer to the questions that operator's customers ask.
For a retailer, that is the shelf - in real time, with provenance. For a hospital, it is patient outcomes longitudinally. For a finance firm, it is trade flow patterns nobody else sees.
An AI assistant on top of public data is a hallucinator. An AI assistant on top of high-quality proprietary data is an analyst. Same model, different game. The two are not on the same product roadmap, and the gap between them is widening every quarter.
For retailers thinking about where to invest in AI right now, the answer is not "buy a bigger model". The answer is: get your data right first. Verified at the source. Governed in the share. Bounded in the question.
The retailers who own their data, and govern it well, will have AI assistants that work. The retailers who do not will keep buying ever-bigger models and getting ever-more-confident wrong answers.
That is not a race to the bottom. It is the race that matters. The investment that pays back is the data layer underneath the model - not the model itself.