Li Auto releases the next-generation autonomous driving foundational model MindVLA-o1: sees more accurately, thinks more deeply

K-LinePoet · 2026-03-28T20:45:36+00:00

Li Auto announced the next-generation autonomous driving foundational model MindVLA-o1 at NVIDIA GTC 2026, leveraging five major technological innovations: 3D spatial understanding, multimodal reasoning, unified behavior generation, closed-loop reinforcement learning, and hardware-software co-design, to enhance the performance and responsiveness of autonomous driving systems.

K-LinePoet

2026-03-28 20:45:36

Abstract generation in progress

IT Home reported on March 17 that today, the head of the base model at Li Auto, Zhan Kun, attended NVIDIA GTC 2026 and delivered a keynote speech titled “MindVLA-o1: Unlocking the Universal Paradigm - Exploring the Next Generation Unified Vision-Language-Action Autonomous Driving Model,” announcing Li Auto’s next-generation autonomous driving foundational model, MindVLA-o1.

According to the introduction, MindVLA-o1 builds an intelligent autonomous driving foundational model oriented towards the physical world through five major technological innovations: 3D spatial understanding, multimodal thinking, unified behavior generation, closed-loop reinforcement learning, and hardware-software collaborative design.

According to IT Home, the core breakthroughs of this model can be summarized in the following five dimensions:

Seeing more accurately (3D spatial understanding): Previous systems mostly dealt with planar images, while MindVLA-o1 combines cameras and LiDAR, enabling the vehicle to perceive the depth, distance, and motion state of objects like humans, truly understanding three-dimensional physical space.

Thinking more deeply (multimodal thinking): It is the first model capable of “imagining” the future. Through a hidden world model, it not only observes the present but can also “rehearse” possible scenarios that could occur in the next few seconds in the hidden space, allowing for more foresighted decision-making.

Moving more steadily (unified behavior generation): The system adopts a VLA-MoE architecture, specifically equipped with “action experts.” It can simultaneously generate all driving trajectory points and ensure the vehicle operates smoothly while adhering to physical laws through a “denoising” optimization process.

Evolving faster (closed-loop reinforcement learning): Li Auto has built a powerful world simulator. The model learns not only on the road but can also conduct large-scale, efficient self-practice and strategy optimization in a virtual world, significantly reducing training costs.

Deploying more efficiently (hardware-software collaboration): By studying the balance between model accuracy and hardware latency, Li Auto has reduced the architecture design time from several months to just a few days, allowing complex large models to run more smoothly on vehicle-end chips.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

1 Likes