Recently, the entire tech and investment worlds have been fixated on the same thing: how AI applications are “killing” traditional SaaS. Since @AnthropicAI’s Claude Cowork demonstrated how easily it can help you write emails, create PowerPoint presentations, and analyze Excel spreadsheets, a panic about “software is dead” has begun to spread. It’s indeed alarming, but if your focus stops here, you might be missing the real seismic shift.
It’s like everyone is looking up at drone dogfights in the sky, while no one notices that the entire continental plates beneath our feet are quietly shifting. The true storm is hidden beneath the surface—a silent revolution in the foundational computing power supporting the entire AI world, happening in a corner most people overlook.
And this revolution might cause the AI “salespeople”: Nvidia @nvidia, to end their grand party much earlier than anyone expected.
Two converging paths of revolution
This isn’t a single event but a convergence of two seemingly independent technological trajectories. Like two armies encircling, they form a pincer movement against Nvidia’s GPU dominance.
The first is the algorithmic slimming revolution.
Have you ever wondered whether a superbrain truly needs to activate all its neurons when thinking? Obviously not. DeepSeek has figured this out and developed an architecture called MoE (Mixture of Experts).
Think of it as a company with hundreds of specialists in different fields. But when solving a problem, you only need to call in the two or three most relevant experts, rather than brainstorming with everyone. That’s the cleverness of MoE: it allows a massive model to activate only a small subset of “experts” during each computation, greatly saving on computing power.
What’s the result? DeepSeek-V2, nominally with 236 billion “experts” (parameters), only activates about 21 billion at a time—less than 10% of the total. Yet, its performance rivals GPT-4, which requires full activation. What does this mean? The capability of AI and the computing power it consumes are decoupling!
In the past, we all assumed that the stronger the AI, the more GPU power it burns. Now, DeepSeek shows that through smart algorithms, the same results can be achieved at a tenth of the cost. This effectively questions Nvidia’s GPU necessity as a core component.
The second is a hardware “lane change” revolution.
AI tasks are divided into training and inference. Training is like going to school—reading thousands of books—where GPU’s parallel computing power is invaluable. But inference is more like our daily AI usage, where response speed matters most.
GPUs have a built-in flaw for inference: their memory (HBM) is external, causing latency in data transfer. It’s like a chef with ingredients stored in a fridge in the next room—no matter how fast, it’s still a delay. Companies like Cerebras and Groq have taken a different route, designing dedicated inference chips with SRAM directly soldered onto the chip, placing ingredients at hand for “zero latency” access.
The market has already voted with real money. OpenAI, complaining about Nvidia’s GPU inference performance, turned around and signed a $10 billion deal with Cerebras to rent their inference hardware. Nvidia itself panicked, spending $20 billion to acquire Groq to avoid falling behind in this new race.
When these two paths intersect: a cost avalanche
Now, let’s put these two developments together: a DeepSeek model that’s “slimmed down” via algorithm, running on Cerebras hardware with “zero latency.”
What happens?
A cost avalanche.
First, the slimmed model is small enough to fit entirely into the chip’s onboard memory. Second, without external memory bottlenecks, AI response speeds become astonishingly fast. The end result: training costs drop 90% thanks to the MoE architecture, and inference costs drop by an order of magnitude due to dedicated hardware and sparse computation. The total cost to own and operate a world-class AI could be just 10-15% of traditional GPU-based solutions.
This isn’t just an improvement—it’s a paradigm shift.
Nvidia’s throne is quietly being pulled out from under it
Now you should understand why this is more deadly than the “Cowork panic.”
Nvidia’s tens-of-trillions market cap is built on a simple story: AI is the future, and that future depends on their GPUs. But now, the foundation of that story is being shaken.
In the training market, even if Nvidia continues to monopolize, if customers can do the same work with one-tenth of the cards, the overall market size could shrink significantly.
In inference, a market ten times larger than training, Nvidia faces not only no absolute advantage but also encirclement from players like Google, Cerebras, and others. Even its biggest customer, OpenAI, is starting to defect.
Once Wall Street realizes that Nvidia’s “shovel” isn’t the only or even the best choice anymore, what happens to the valuation built on “perpetual monopoly”? Everyone knows the answer.
So, the biggest black swan in the next six months might not be another AI app knocking out a competitor, but a seemingly insignificant tech news—like a new paper on MoE algorithm efficiency or a report showing a surge in market share for dedicated inference chips—that quietly signals a new phase in the compute war.
When the “shovel sellers” no longer have the only option, their golden age may be coming to an end.
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
The Next AI Earthquake: Why the Real Danger Isn't SaaS Killers, but the Computing Power Revolution?
By Bruce
Recently, the entire tech and investment worlds have been fixated on the same thing: how AI applications are “killing” traditional SaaS. Since @AnthropicAI’s Claude Cowork demonstrated how easily it can help you write emails, create PowerPoint presentations, and analyze Excel spreadsheets, a panic about “software is dead” has begun to spread. It’s indeed alarming, but if your focus stops here, you might be missing the real seismic shift.
It’s like everyone is looking up at drone dogfights in the sky, while no one notices that the entire continental plates beneath our feet are quietly shifting. The true storm is hidden beneath the surface—a silent revolution in the foundational computing power supporting the entire AI world, happening in a corner most people overlook.
And this revolution might cause the AI “salespeople”: Nvidia @nvidia, to end their grand party much earlier than anyone expected.
Two converging paths of revolution
This isn’t a single event but a convergence of two seemingly independent technological trajectories. Like two armies encircling, they form a pincer movement against Nvidia’s GPU dominance.
The first is the algorithmic slimming revolution.
Have you ever wondered whether a superbrain truly needs to activate all its neurons when thinking? Obviously not. DeepSeek has figured this out and developed an architecture called MoE (Mixture of Experts).
Think of it as a company with hundreds of specialists in different fields. But when solving a problem, you only need to call in the two or three most relevant experts, rather than brainstorming with everyone. That’s the cleverness of MoE: it allows a massive model to activate only a small subset of “experts” during each computation, greatly saving on computing power.
What’s the result? DeepSeek-V2, nominally with 236 billion “experts” (parameters), only activates about 21 billion at a time—less than 10% of the total. Yet, its performance rivals GPT-4, which requires full activation. What does this mean? The capability of AI and the computing power it consumes are decoupling!
In the past, we all assumed that the stronger the AI, the more GPU power it burns. Now, DeepSeek shows that through smart algorithms, the same results can be achieved at a tenth of the cost. This effectively questions Nvidia’s GPU necessity as a core component.
The second is a hardware “lane change” revolution.
AI tasks are divided into training and inference. Training is like going to school—reading thousands of books—where GPU’s parallel computing power is invaluable. But inference is more like our daily AI usage, where response speed matters most.
GPUs have a built-in flaw for inference: their memory (HBM) is external, causing latency in data transfer. It’s like a chef with ingredients stored in a fridge in the next room—no matter how fast, it’s still a delay. Companies like Cerebras and Groq have taken a different route, designing dedicated inference chips with SRAM directly soldered onto the chip, placing ingredients at hand for “zero latency” access.
The market has already voted with real money. OpenAI, complaining about Nvidia’s GPU inference performance, turned around and signed a $10 billion deal with Cerebras to rent their inference hardware. Nvidia itself panicked, spending $20 billion to acquire Groq to avoid falling behind in this new race.
When these two paths intersect: a cost avalanche
Now, let’s put these two developments together: a DeepSeek model that’s “slimmed down” via algorithm, running on Cerebras hardware with “zero latency.”
What happens?
A cost avalanche.
First, the slimmed model is small enough to fit entirely into the chip’s onboard memory. Second, without external memory bottlenecks, AI response speeds become astonishingly fast. The end result: training costs drop 90% thanks to the MoE architecture, and inference costs drop by an order of magnitude due to dedicated hardware and sparse computation. The total cost to own and operate a world-class AI could be just 10-15% of traditional GPU-based solutions.
This isn’t just an improvement—it’s a paradigm shift.
Nvidia’s throne is quietly being pulled out from under it
Now you should understand why this is more deadly than the “Cowork panic.”
Nvidia’s tens-of-trillions market cap is built on a simple story: AI is the future, and that future depends on their GPUs. But now, the foundation of that story is being shaken.
In the training market, even if Nvidia continues to monopolize, if customers can do the same work with one-tenth of the cards, the overall market size could shrink significantly.
In inference, a market ten times larger than training, Nvidia faces not only no absolute advantage but also encirclement from players like Google, Cerebras, and others. Even its biggest customer, OpenAI, is starting to defect.
Once Wall Street realizes that Nvidia’s “shovel” isn’t the only or even the best choice anymore, what happens to the valuation built on “perpetual monopoly”? Everyone knows the answer.
So, the biggest black swan in the next six months might not be another AI app knocking out a competitor, but a seemingly insignificant tech news—like a new paper on MoE algorithm efficiency or a report showing a surge in market share for dedicated inference chips—that quietly signals a new phase in the compute war.
When the “shovel sellers” no longer have the only option, their golden age may be coming to an end.