According to CoinWorld, Alibaba Qianwen announced the launch of the all-modal large model Qwen3.5-Omni. The Qwen3.5-Omni series includes Instruct versions in three sizes: Plus, Flash, and Light, supporting a 256k long context. The model supports over 10 hours of audio input and over 400 seconds of 720P (1FPS) audio and video input. The model is pretrained on massive amounts of text, visual data, and over 100 million hours of audio and video data, demonstrating excellent multimodal perception and generation capabilities. Compared to Qwen3-Omni, Qwen3.5-Omni has significantly enhanced multilingual capabilities, supporting speech recognition in 113 languages and dialects, and speech generation in 36 languages and dialects.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin