Just now, a domestic video model topped the global rankings! It gave Google's Veo a lesson and even made money from it.

Domestic video generation models have just taken the top spot in the world for the first time.

Just now, in the latest ranking by third-party organization Artificial Analysis, SkyReels V4 secured the number one position globally in Text-to-Video (including audio)!

It outperformed Google’s Veo 3.1 and surpassed Kling 3.0.

More importantly, this ranking isn’t based on manufacturer benchmarks. It reflects blind reviews from a large number of real users.

This marks that in the most challenging and valuable “text-to-video + audio” track, domestic models have already reached the forefront.

On February 27, when SkyReels V4 Preview was first unveiled, it was already ranked second worldwide.

Less than a month later, SkyReels V4 moved up another step, directly claiming the top spot.

SkyReels V4 is not just more powerful; it is already rewriting the global video model rankings.

It signifies that China’s AIGC video technology is officially leading the world.

At the Zhongguancun Forum 2026, SkyReels V4 will be officially released with full features, and the API is already open (skyreels.ai).

Links:

In other modalities, SkyReels V4 also performs excellently, ranking second in “Text to Video (no audio).”

Talking about data is meaningless; seeing is believing. Next, let’s take a good look at how formidable the world’s number one video AI really is.

Real-world Test: King of Short AI Dramas

For Kunlun Wanwei’s Tiangong AI, SkyReels is beginning to transition into a complete multi-modal video generation system supporting text, images, videos, and audio inputs.

It is the world’s first foundational video model supporting multi-modal inputs, joint audio-video generation, and unified tasks such as generation, restoration, and editing.

The following six tests will each demonstrate the incredible capabilities of this model.

AI Short Drama Generation: Two images + a line of dialogue directly produce a cinema-grade short drama.

Just upload two character images and write a line of dialogue.

SkyReels V4 can produce a 1080p, 32FPS, 15-second video instantly.

The visual quality, character expressions, lip-sync, are almost indistinguishable from real AI.

Whether the faces are East Asian or Western, the results are extremely natural.

Thunder roars, sandstorms sweep across the wasteland, Guan Gong and Qin Qiong engage in epic combat—

From simple text to complete video + audio, even beginners can easily create movie-level content, truly achieving “whatever you want to shoot, just shoot it”!

And the AI flavor is almost gone.

More importantly, this time it’s not “generate images first, then hard-attach sound.”

SkyReels V4 is specially designed to handle both visuals and audio simultaneously.

Multi-frame reference: nine images, finally locking down characters and plot.

One of the most significant upgrades in SkyReels V4 is multi-frame reference.

You can provide up to 9 keyframes.

It will interpolate the intermediate actions, shots, and transitions based on these nine images.

This is very important and highly practical.

In the past, two common pitfalls in AI short drama creation were:

  • The face changing from one second to the next;

  • Moving from one scene to another abruptly.

SkyReels V4’s most practical improvement is eliminating these issues, making it the undisputed king of AI web dramas.

Prompt example: “@Image-1’s bright young man keeps running forward, passing several corners with camera tracking; then switch to @Image-2, where the young man is shirtless, continuing to run and then turning sharply; then switch to @Image-3, showing his surprised expression; finally switch to @Image-4, where he twists a dial to the right, with a large cloud of smoke filling the frame.”

This level of video control is simply amazing.

The style is also completely unified. For this kind of web drama, there’s not a trace of AI flavor.

For example, this animation resembles a “No-Face” monster.

Based on the anime scene in @Image-1, it naturally transitions from top to bottom, left to right, generating a short animation.

The fight scenes are smooth, with very reasonable shot transitions.

This fantasy-style animation is also effortless.

Thanks to SkyReels V4’s synchronized audio and visual generation capabilities, lip-syncing for characters speaking is no longer a challenge.

One-stop video editing—“cut videos with your mouth.”

Even more impressive, it not only generates videos but can directly modify them, making it a post-production神器.

You can ask it to do three types of edits:

First, add elements to the scene.

Add a hat to a character, place flowers in a room, or insert a new character into the original scene.

Add the blue ribbed knit beanie from @Image-1 onto the head of the central dancer in @Video-1.

With just one sentence, the hat is added to the girl’s head.

Even more astonishingly, the angles are perfect.

Truly stunning.

Second, modify character actions.

Make a new character dance along with the original, or rebind actions.

Add the colorful fursuit character from @Image-1 into the urban dance scene in @Video-1, placing them on the dance floor next to the dancer. The character should mirror the dancer’s movements with a playful, exaggerated dance style.

Not only adding the character, but it can also dance along with the original performer.

This video generation comprehension is incredible.

Third, direct cleanup.

Remove subtitles, watermarks, logos, passersby, animals, or any unwanted distractions.

This editing capability, based on the model’s full understanding of the video, is extremely powerful.

In the past, you had to switch repeatedly between Premiere, After Effects, and various AI tools to complete such tasks. Now, SkyReels V4 handles everything within a single model.

In other words, video generation, element insertion, character editing, and scene cleanup are converging into a unified editing framework.

A major breakthrough is unifying video generation, frame interpolation, extension, and editing into a single interface, enabling text-to-video, image-to-video, video extension, keyframe interpolation, and local/global edits all under the same processing framework.

Technical Breakthrough: How Does It Compete with Seedance 2.0?

After seeing the results, let’s examine what makes SkyReels V4’s technology so advanced.

Last month, when SkyReels V4 Preview ranked second globally among active models, we provided a detailed analysis. — After Seedance 2.0 exploded, another Chinese dark horse topped the AA list! No more AI flavor.

Less than a month from being second in the world to officially claiming first—this speed is called “cheating” in gaming, and “SkyReels-V4” in AI circles.

SkyReels V4’s leap forward is not just minor tweaks.

It mainly fixed two longstanding issues in video AI.

The first issue is “beautiful visuals but illogical scenes.”

For example, water flowing upward, cups floating in the air, or awkward movements when characters turn.

To solve this, SkyReels V4’s training no longer only focuses on “looks like,” but also on “correctness.”

In plain terms, it added a stricter scoring system:

  • The visuals must be beautiful, actions reasonable, and audio must match lip movements and rhythm.

  • Any inconsistencies are repeatedly corrected and retrained.

This process is called full-modal reinforcement learning in the paper.

Additionally, the team introduced a staged curriculum reinforcement learning mechanism, focusing on resolution, duration, task complexity, and data difficulty, gradually advancing the model from simple to complex tasks, continuously improving its control over high-difficulty scenarios.

Think of it as: previously, teachers only judged the surface appearance; now, they also evaluate logic, actions, and expressions.

It’s like moving from just caring about exam scores to also paying attention to the learning process and teaching methods.

The second issue is “characters forget.”

Provide a few keyframes, and SkyReels V4 can interpolate the middle frames. Provide nine story images, and it will try to keep character faces, costumes, and scene styles consistent throughout.

This is crucial for AI short dramas.

In the past, inconsistent characters would break immersion—one scene a hero has a pointed chin, the next a square face, making viewers lose focus.

With nine-frame references, characters stay consistent, scenes are coherent, and AI short dramas have finally upgraded from “just for fun” to “worthy of serious watching.”

These two capabilities—video consistency and controllability—reach industry-leading levels, transforming SkyReels V4 from a “video generation tool” into an “industrialized short drama production engine.”

SkyReels V4 technical report is also publicly available.

Technical Report:

Facing Practical Tests: China’s Netflix of AI is Here

What’s truly noteworthy isn’t just the ranking but that this model has already been integrated into real business operations.

DramaWave: China’s AI Netflix.

SkyReels V4’s technology directly supports Kunlun Wanwei’s short drama platform DramaWave.

As of January 2026, driven by DramaWave and FreeReels, Kunlun’s short drama platform has surpassed 80 million MAU, with annual revenue exceeding $480 million, and monthly revenue reaching $40 million.

These aren’t just PPT figures—they are real users paying to watch AI-produced content.

Recently, DramaWave launched the “Million-Dollar Drama AI” creation support plan, inviting top creators worldwide. Kunlun’s self-developed AI short drama tool SkyAnime also launched simultaneously, empowering creators from the tool level and boosting production efficiency.

Nearly a thousand AI-produced dramas are on DramaWave, with a monthly output of over 30.

Take the self-made AI short drama “Looted Entries! I Transformed into a Undead Plague” as an example, produced with SkyAnime, costing less than $20,000. After release, it spent over $100,000 in a single day and accumulated millions of views.

This forms a perfect “technology → product → commercialization” closed loop.

Upgrading from “fragmented generation” to full industrial chain video production.

SkyReels V4’s significance goes far beyond “being able to generate a good-looking video.”

For the AI short drama industry, it solves the core pain point: character consistency.

Past AI short dramas often had characters “change faces” from shot to shot, making it impossible for viewers to get immersed.

SkyReels V4’s nine-frame reference capability ensures characters remain consistent throughout the series, elevating AI short dramas to a level where they can be watched seriously.

This is a qualitative leap for the entire AI film and TV industry.

Providing a unified video generation foundation for gaming, music, and content ecosystems.

It’s worth noting that SkyReels V4 is not an isolated product.

Kunlun Wanwei also has the AI music creation platform Mureka—its O1 model is the world’s first to incorporate Chain of Thought (CoT) technology for music reasoning. Version V8 continues breakthroughs in timbre, performance techniques, and emotional expression, with users across over 100 countries.

SkyReels V4’s video capabilities + Mureka’s music capabilities form a full-chain creative loop from visuals to sound, from scoring to vocals.

Having a company with top-tier video and music models worldwide is rare.

A brand can generate a complete video ad with one sentence; an independent musician can turn a song into a high-quality MV; an educational institution can automatically convert courses into videos with narration, music, and dynamic visuals—these are not just ideas but happening now.

All in AGI

Looking back at Kunlun Tiangong’s development in large-scale video models, the rise of SkyReels V4 is no accident but a strategic, well-planned explosion.

  • February 2025: Open-source SkyReels V1—the first Chinese video generation model for AI short dramas, trained on tens of millions of film and TV data, supporting 33 micro-expressions and over 400 action combinations.
  • April 2025: Release of SkyReels V2—the world’s first infinite-length movie generation model using Diffusion Forcing framework.
  • January 2026: Open-source SkyReels V3—supporting 1-4 reference images for multi-actor video generation.
  • February 2026: SkyReels V4 Preview released—ranked second globally.
  • March 2026: SkyReels V4 officially ranks first worldwide.

From V1 to V4, it’s not just about adding parameters. Each generation addresses a key shortcoming.

With an average of major upgrades every 3-4 months, this rapid iteration pace is almost unmatched in the global AI video field.

Coupled with Mureka’s leading position in AI music, breakthroughs in large language models and multi-modal reasoning with Skywork series, and the commercialization of DramaWave, Kunlun Wanwei is building a complete AI ecosystem covering “computing power—models—applications.”

This is the most convincing showcase of Kunlun Wanwei’s core “All in AGI and AIGC” strategy since early 2023.

The moment of “unification” in AI video creation

Looking back from spring 2026, the AI video generation field has undergone earth-shaking changes over the past year.

From the first wave led by Sora, to the hundred schools of models like Veo, Kling, Seedance, and finally SkyReels V4’s “multi-modal reference + joint audio-video generation + unified task framework + full-modal reinforcement learning” capabilities leading the world—we are witnessing a new era.

In this era, video creation is no longer exclusive to professional teams but accessible to anyone with creativity.

And the technological direction represented by SkyReels V4—using one model, one operation, to complete the entire process from text conception to audio-video finished product—is the clearest path to that future.

Kunlun Wanwei’s technical report also reveals three future directions: expanding longer-duration (30+ seconds) video generation, enhancing real-time interactive editing, and opening model APIs for broader creative tool integration.

Each of these will further narrow the gap between AI video creation and professional filmmaking.

The race in AI video is far from over. But SkyReels V4 has already proven one thing with its top global ranking:

In this race, the voice from China’s Kunlun Wanwei not only deserves the world’s attention—it is already standing at the top of the world.

Source: Xinzhiyuan

Risk Warning and Disclaimer

Market risks exist; investments should be cautious. This article does not constitute personal investment advice and does not consider individual users’ specific investment goals, financial situations, or needs. Users should evaluate whether any opinions, viewpoints, or conclusions herein are suitable for their circumstances. Invest at your own risk.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin