Home / Tech / Can tech companies learn to love cheaper AI models? 

Can tech companies learn to love cheaper AI models? 

Spread the love

The AI ​​boom has been built on a basic assumption: bigger models are more powerful, and stronger models win. Now, the industry is about to find out what happens if that assumption starts to crumble.

Rising costs have already put pressure on users to give smaller, cheaper models a second look. This cost-conscious shopping model is new and it is not clear how it will impact the industry, but the impact is likely to be significant.

One prediction, made by Coinbase co-founder Brian Armstrong, is that this will result in the vast majority of tasks shifting to cheaper models.

“[D]The demand for intelligence is almost infinite, but 80% of workloads will be running on 99% cheaper models within 12 to 18 months. Written on X. “20% of workloads will continue to run on the latest gen models where maximum IQ is important.”

It is difficult to overstate how dramatically the AI ​​industry will be transformed if Armstrong’s predictions come true.

Before now, most AI companies competed on quality, which meant falling behind the most advanced model available. If these same functions can be handled by cheaper models without compromising on quality, it would mean a huge shift in the economics of AI. More importantly, much of the savings will come out of the pockets of the big labs, dealing a financial blow to OpenAI and Anthropic just as they head toward their own IPO.

It’s a potentially seismic change in the industry, and it’s based on one fundamental question: Are companies ready to shift to smaller models?

See also  As China's 996 culture spreads, South Korea's tech sector grapples with 52-hour limit

Initial tests indicate that when the system is properly arranged, cheaper models can be used without any sacrifice in quality. In a recent test conducted by Harvey’s Legal AI tool, the company was able to reduce inference costs by 3x without reducing quality. test, It is implemented in partnership Using the Fireworks AI inference platform, Claude combined Opus with Fireworks’ GLM 5.1, and moved to Opus for more intensive tasks. The result was significantly reduced load in terms of server time and overall cost.

“Quality comes first, and on the legal side it always will,” Harvey co-founder Gabe Pereira told TechCrunch, referring to the AI ​​legal services his startup offers. “However, the definition of quality is evolving from simply using the strongest model for everything, to using the best model that gets the right answer most efficiently.”

This trend is often framed in terms of major labs versus Chinese models or open-weight ones, but this misses the larger point. The real gap is not between proprietary and open models; It is between large and small models. You can save money by switching from GPT-5.5 to DeepSeek’s V4 Flash, but switching to GPT-5.4-mini works just as well.

There is an active price war taking place between in-house heuristics from large laboratories and open-weight models that are offered independently. As for the bigger question of small versus large, it doesn’t really matter which type of small model wins.

This all may seem obvious – of course you shouldn’t use too much computing – but it goes against the scale-first approach that has dominated the industry so far. Inspired by The bitter lessonlabs have taken a keen interest in training the most widely used computing models, pushing the limits of what AI models can do. With prices heavily supported by investors, customers had no reason to choose anything but the most advanced option.

See also  Apple AirPods as hearing aids: how gadgets become assistive tech

With token prices rising and support slowing, users are facing cost pressures for the first time. We don’t know if the new cost pressure will actually push enterprise users to smaller models. They can just as easily save by making fewer calls, using less context, or simply abandoning less promising deployments.

But if it turns out that most deployments can also be done on a smaller model, that could put a serious impediment to the growing demand for inference — and raise new questions about how to justify the cost of training a parametric model.

When you buy through links in our articles, we may earn a small commission. This does not affect our editorial independence.

Source link

Tagged: