Guangfa Securities: Nvidia (NVDA.US) New Platform Strengthens Agent Application Competitiveness, AI Inference Drives Continued Upward Storage Cycle

robot
Abstract generation in progress

GF Securities Releases Research Report Stating that NVIDIA (NVDA.US) Showcased Vera Rubin POD at GTC, Focusing on Enhancing Competitiveness of Agent Applications in Clustered Computing and Inference Product Lines. As AI Advances, Model Innovation and CAPEX Foundations Drive Industry Chain Collaboration; AI Inference Continues to Push Storage Cycles Upward, with Expansion and Upgrades Working in Tandem. Recommended to Focus on Core Beneficiary Targets in the Industry Chain.

GF Securities Main Points:

NVIDIA Launches Vera Rubin POD Platform

According to NVIDIA’s official website, on March 16, 2026, NVIDIA showcased Vera Rubin POD at GTC, including five new rack-scale systems designed specifically for Agentic AI workloads. Due to the higher demands of Agentic workloads for high throughput, ultra-low latency inference, dense CPU sandboxing, and large context memory, NVIDIA is emphasizing strengthening competitiveness in cluster computing and inference product lines for Agent applications. Vera Rubin POD mainly consists of two types of racks: (1) MGXNVL rack, i.e., Vera Rubin NVL72, interconnected internally via NVLink to handle core GPU computing tasks; (2) MGXETL rack, including Groq3 LPX racks, Vera CPU racks, BlueField-4 STX storage racks, and Spectrum-6 SPX network racks, all interconnected via SpectrumX Ethernet or Groq3 LPU chips for collaboration. According to the official diagram, a Vera Rubin 1152 SuperPOD comprises 16 Vera Rubin NVL72 racks, 2 Vera CPU racks, 10 Groq3 LPX racks, 2 BlueField-4 STX storage racks, and 10 Spectrum-6 SPX network racks, reflecting a heterogeneous collaborative system architecture built around Agentic AI.

Groq3 LPX Racks Accelerate Decoding

Groq3 LPX racks integrate 256 LPU processors, equipped with 128 GB on-chip SRAM and 640TB/s bandwidth. In the combined architecture of Vera Rubin NVL72 and LPX, GPUs mainly handle prefill and attention calculations during decoding, while LPUs accelerate FFN computations in the decoding process, speeding up decoding for each output token per layer. They collaborate with Vera Rubin racks via customized Spectrum-X interconnects. According to NVIDIA’s disclosures, under 400 TPS per user, the combination of Vera Rubin NVL72 and LPX can achieve up to 35 times TPS improvement per megawatt compared to NVIDIA’s GB200 NVL72, enhancing overall system output and better fitting low-latency, highly interactive Agent applications.

Vera CPU Racks Support RL/Agent Sandbox Environments

Vera CPU racks integrate 256 Vera CPUs, utilizing high-density liquid cooling. A single rack can support over 22,500 concurrent reinforcement learning (RL) or agent sandbox environments, used for testing, executing, and validating outputs from Vera Rubin NVL72 and LPX.

Risk Warning

Development of AI industry and demand may fall short of expectations; AI server shipments may underperform; domestic manufacturers’ technology and product progress may not meet expectations.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin