
Chinese tech giant Alibaba is back in the headlines as the company has launched an upgrade to its open-source video-generating AI model. The company has added advanced avatar capabilities to push the limits of what AI-generated video can do.
The updated model, called Wan2.2-S2V, can turn a single portrait photo into what the company calls “film-quality avatars” that can speak, sing, and perform on command. Announced on Tuesday, the new version adds speech-to-video functionality and builds on Alibaba’s existing video generation system.
The model incorporates cinematic-level aesthetics and complex motion generation using a Mixture-of-Experts (MoE) architecture with 14 billion parameters. It can generate synchronized full-body or half-body character videos at 720p resolution, with natural facial expressions, fluid body movements, and professional-style camera work.
Benchmark testing shows Wan2.2-S2V outperforms other video-generation platforms, including EchoMimicV2 and MimicMotion, across key measures such as video quality (FID), expression authenticity (EFID), and identity consistency (CSIM).
The upgrade underscores Alibaba’s urgency in keeping pace with rivals after the rise of Chinese AI startup DeepSeek, which has gained international attention with breakthrough research this year.
As a result, along with the new avatar capabilities, Alibaba is also moving into AI developer tools. Last week, it launched Qoder, which is an AI-powered code editor now in public preview and available for free. The standout feature of the tool is the new Repo Wiki, which automatically maps out hidden details in a project’s codebase, from architecture to dependencies, giving teams a shared understanding without relying on “that one engineer who knows everything.”