Qwen Rolls Out New Vision‑Language Model To Advance Coding, Reasoning, And Multimodal AI Performance

In Brief

Qwen team has launched the open‑weight Qwen3.5‑397B‑A17B model, introducing major advances in multimodal performance, reinforcement learning, and training efficiency as part of a broader push toward more capable, general‑purpose AI agents.

Qwen Rolls Out New Vision‑Language Model To Advance Coding, Reasoning, And Multimodal AI Performance

Alibaba Cloud’s Qwen team has introduced the first model in its new Qwen3.5 series, unveiling the open‑weight Qwen3.5‑397B‑A17B

Positioned as a native vision‑language system, the model delivers strong performance across reasoning, coding, agent tasks, and multimodal understanding, reflecting a significant advance in the company’s large‑scale AI development efforts

The model is built on a hybrid architecture that combines linear attention through Gated Delta Networks with a sparse mixture‑of‑experts design, enabling high efficiency during inference. Although the full system contains 397 billion parameters, only 17 billion are activated for each forward pass, allowing it to maintain high capability while reducing computational cost. The release also expands language and dialect coverage from 119 to 201, broadening accessibility for users and developers worldwide.

Qwen3.5 Marks A Major Leap In Reinforcement Learning And Pretraining Efficiency

The Qwen3.5 series introduces substantial gains over Qwen3, driven largely by extensive reinforcement learning scaling across a wide range of environments. Rather than optimizing for narrow benchmarks, the team focused on increasing task difficulty and generalizability, resulting in improved agent performance across evaluations such as BFCL‑V4, VITA‑Bench, DeepPlanning, Tool‑Decathlon, and MCP‑Mark. Additional results will be detailed in an upcoming technical report.

Pretraining improvements span power, efficiency, and versatility. Qwen3.5 is trained on a significantly larger volume of visual‑text data with strengthened multilingual, STEM, and reasoning content, enabling it to match the performance of earlier trillion‑parameter models. Architectural upgrades—including higher‑sparsity MoE, hybrid attention, stability refinements, and multi‑token prediction—deliver major throughput gains, particularly at extended context lengths of 32k and 256k tokens. The model’s multimodal capabilities are strengthened through early text‑vision fusion and expanded datasets covering images, STEM materials, and video, while a larger 250k vocabulary improves encoding and decoding efficiency across most languages.

The infrastructure behind Qwen3.5 is designed for efficient multimodal training. A heterogeneous parallelism strategy separates vision and language components to avoid bottlenecks, while sparse activation enables near‑full throughput even on mixed text‑image‑video workloads. A native FP8 pipeline reduces activation memory by roughly half and increases training speed by more than 10 percent, maintaining stability at massive token scales

Reinforcement learning is supported by a fully asynchronous framework capable of handling models of all sizes, improving hardware utilization, load balancing, and fault recovery. Techniques such as FP8 end‑to‑end training, speculative decoding, rollout router replay, and multi‑turn rollout locking help maintain consistency and reduce gradient staleness. The system is built to support large‑scale agent workflows, enabling seamless multi‑turn interactions and broad generalization across environments.

Users can interact with Qwen3.5 through Qwen Chat, which offers Auto, Thinking, and Fast modes depending on the task. The model is also available through Alibaba Cloud’s ModelStudio, where advanced features such as reasoning, web search, and code execution can be enabled through simple parameters. Integration with third‑party coding tools allows developers to adopt Qwen3.5 into existing workflows with minimal friction.

According to the Qwen team, Qwen3.5 establishes a foundation for universal digital agents through its hybrid architecture and native multimodal reasoning. Future development will focus on system‑level integration, including persistent memory for cross‑session learning, embodied interfaces for real‑world interaction, self‑directed improvement mechanisms, and economic awareness for long‑term autonomous operation. The objective is to move beyond task‑specific assistants toward coherent, persistent agents capable of managing complex, multi‑day objectives with reliable, human‑aligned judgment.

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
  • Pin

Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)