Token as Productive Capacity: Large Model Price War Begins

robot
Abstract generation in progress

Securities Daily Reporter Yuan Chuanxi

Recently, the surge in AI intelligent agents has swept across various industries. AI agents are integrating into daily work and life scenarios at an unprecedented speed.

Behind this trend is an exponential increase in computing power demand—large-scale deployment of personal AI agents has led to massive Token (digital identifiers in computing) consumption, quickly breaking through the cost thresholds of major model providers.

Recently, domestic AI companies such as Beijing Zhipu Huazhang Technology Co., Ltd. (hereafter “Zhipu”) and Tencent Cloud have issued notices of price increases for AI computing power products, with some products seeing increases of over 400%. This strategic shift from “burning money for growth” to “raising prices to increase volume” not only marks the end of industry wild growth but also reflects a profound change in the supply and demand relationship for computing power in the AI agent era.

Rebuilding the Large Model Pricing System

The pricing system for large models is undergoing a systematic overhaul, with domestic providers accelerating price hikes. This phenomenon contrasts sharply with the price wars of two years ago.

In May 2024, ByteDance launched the first price war, pricing its Doubao Pro model at 0.0008 yuan per 1,000 Tokens, 99.3% below the industry average. Soon after, Alibaba Cloud’s Tongyi Qianwen main models were reduced by 97%, Baidu’s Wenxin large models became completely free, and Tencent’s Hunyuan large model prices dropped by up to 87.5%. For a time, the industry was caught in a wave of price reductions.

“At that time, the logic was simple: let developers use it first; market share was more important than anything else,” a product manager with three years of AI experience told Securities Daily. In 2024, an internal target at a leading company was set to not consider profitability for three years, with product prices even below the cost of computing power.

However, the marginal effects of low prices quickly diminished. Industry analysts told Securities Daily that while the price war from 2024 to 2025 accelerated the market adoption of AI large models, it also led to a widespread “high investment, low return” dilemma. As model invocation volumes soared from hundreds of billions to trillions, computing costs increased exponentially. Relying solely on capital infusion became unsustainable. From the second half of 2025, some small and medium-sized companies began quietly reducing free quotas.

“This is not just a simple price increase; it’s an inevitable result of changes in cost structure,” an executive from a leading cloud provider explained to Securities Daily. “In the past, the industry used losses to gain market share; by 2026, sustainable operation must be considered.”

Token Inflation

To understand the collective price hikes of domestic large models, one must first grasp the concept of “Token inflation.”

Tokens are the smallest units of text processing in large models, akin to a measure of AI workload. When the industry talks about Token inflation, it refers to a surge in the complexity of AI tasks, which causes the same service to consume more computing resources. It’s like switching from lighting a small lamp to powering a factory—electricity costs naturally rise.

This “inflation” pressure primarily stems from explosive overseas market demand. In February 2026, OpenRouter, a major global API distribution platform for large models, reported that the total Token consumption of the top ten AI models worldwide exceeded 27 trillion that month, with Chinese large models contributing 14 trillion, accounting for over 50%.

“This indicates that domestic large models are shifting from domestic demand-driven to global export,” said Zhang Yi, CEO of Guangzhou iMedia Data Intelligence Consulting Co., Ltd., in an interview with Securities Daily. “Overseas users’ usage habits are very different from domestic ones.” European and American developers prefer embedding large models into production workflows, often involving multiple tool calls, long context retrieval, and code generation per request. “A single API call in overseas scenarios can consume three to five times more Tokens than in China.”

If overseas markets are the external factor, then the large-scale deployment of AI intelligent agents is the internal driver pushing up computing costs.

Unlike early chatbot Q&A, AI agents possess a closed-loop capability of “perception–decision–execution,” enabling autonomous completion of complex tasks. For example, in financial risk control: a single AI agent completing a loan approval must go through user profile retrieval (long context), credit data calls (tool use), risk assessment calculations (reasoning chain), and report generation (output), with total Token consumption reaching hundreds of thousands.

Multiple factors combined produce astonishing data. Guolian Minsheng Securities estimates that China’s overall daily Token consumption surged from 100 billion in early 2024 to 180 trillion in February 2026. As AI agents evolve toward multimodal and multi-agent collaboration, this number continues to accelerate.

The reversal of supply and demand ultimately influences pricing. Since 2025, global AI infrastructure has faced capacity shortages, with server procurement costs rising sharply year-over-year due to tight supplies of HBM (high-bandwidth memory, a core component for AI training) and advanced GPU chips.

For example, on March 17, Alibaba Cloud announced that due to exploding global AI demand and supply chain price increases, their AI computing, storage, and related products saw price hikes of up to 34%.

As large model providers shift from “water sellers” to “water drinkers,” price increases have become a rigid choice to maintain service quality. Zhipu AI explicitly stated in its price adjustment notice: “The rapid growth in user scale and invocation volume requires us to increase investment in computing power.”

Business Model Restructuring

The price hikes not only address cost gaps but also signal a deep restructuring of the entire industry’s business logic.

“After the price war ends, the real value war begins,” said the aforementioned cloud executive. They believe 2026 will be the year of large-scale AI commercialization, with industry competition shifting from simply owning computing power to providing efficient, stable, and low-cost model services and AI applications.

Currently, the large model industry is shifting from “traffic subsidies” to “value filtering.” Early low-price strategies attracted many trial-and-error users, leading to inefficient use of computing resources. One company estimated that 40% of their free quota was used for testing without actual business scenarios. Moderate price increases help filter out non-essential demand and ensure stable service for high-quality clients. The significant price hikes by Zhipu, Tencent Cloud, and others are actually aimed at aligning prices with enterprise customers’ willingness to pay and ROI (return on investment). This “raising prices to boost volume” refined operation marks China’s large model industry moving from internet-style scale expansion to software industry-style value pricing.

Pan Helin, a member of the Information and Communications Economic Expert Committee of the Ministry of Industry and Information Technology, told Securities Daily that price increases will not suppress genuine demand but will accelerate the “good money driving out bad.” Enterprise clients’ high requirements for stability and compliance mean they are willing to pay more, with higher lifetime value, giving large model providers confidence to shift from “traffic thinking” to “value-based pricing.”

This transformation is reshaping the entire industry chain’s profit landscape. Upstream computing power providers (like NVIDIA) continue to benefit; midstream cloud providers (like Alibaba Cloud and Tencent Cloud) seek a balance between selling models and selling computing power—aiming to attract customers with AI services while avoiding being overwhelmed by high hardware costs; downstream application layers show clear differentiation: large companies with R&D capabilities (such as ByteDance and Baidu) can flexibly allocate computing resources internally to hedge against price increases, while small startups relying solely on API calls face soaring costs and potential shutdowns.

Enterprise large model providers are also focusing on the deep changes in Token economics. Yang Lei, co-founder and executive director of DeepTech Co., Ltd., told Securities Daily, “In the future, Token will represent capacity. As Skill-based Models reshape industries like software development, data analysis, and customer service outsourcing, traditional per-person, per-day pricing will be replaced by ‘Token consumption’ pricing. This is not just a change in measurement units but a leap in productivity paradigm.”

Zhang Yi added that from a global competitive perspective, Token inflation is also a byproduct of domestic model technological advancement. Price increases are not the end but the beginning of a new efficiency revolution. Those who can continuously optimize cost structures in this arms race for computing power will secure their position on the global AI agent stage.

Looking back at the 2024 price war and the current collective price hikes, China’s large model industry is undergoing a painful rite of passage. The era of relying on underpriced hype has ended. A new era of winning through technological efficiency, customer value, and ecological closed loops is gradually unfolding amid the torrent of Token economics.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin