Generate high-fidelity character videos by precisely replicating expressions and actions from performer videos, or replace characters in videos with seamless environmental integration
Wan-Animate can add animations to any character based on performer videos, precisely replicating the performer's facial expressions and actions to generate highly realistic character videos
Wan-Animate can replace characters in videos with animated characters, preserving expressions and actions while replicating original lighting and color tones for seamless environmental integration
We introduce Wan-Animate, a unified framework for character animation and character replacement. Given a character image and a reference video, Wan-Animate can add animations to the character by precisely replicating the expressions and actions of the character in the video, thereby generating high-fidelity character videos.
Alternatively, it can integrate animated characters into reference videos to replace original characters, replicating the scene's lighting and color tones for seamless environmental integration. Wan-Animate is built upon the Wan model. To adapt it for character animation tasks, we employ an improved input paradigm to distinguish reference conditions and generation regions.
This design unifies multiple tasks into a common symbolic representation. We use spatially aligned skeletal signals to replicate body movements and employ implicit facial features extracted from source images to reproduce expressions, thereby achieving character video generation with high controllability and expressiveness.
Furthermore, to enhance environmental integration during character replacement, we develop an auxiliary Relighting LoRA. This module applies appropriate environmental lighting and color tones while maintaining character appearance consistency. Experimental results demonstrate that Wan-Animate achieves state-of-the-art performance. We are committed to open-sourcing the model weights and their source code.
Overview of Wan-Animate built on Wan-I2V, we modify its input formulation to unify reference image input, temporal frame guidance, and environmental information (dual-mode compatibility) under a common symbolic representation
Use spatially aligned skeletal signals to precisely replicate body movements, achieving natural and smooth character animation effects
Utilize implicit features extracted from face images as driving signals to achieve highly realistic facial expression reproduction and emotional transmission
Train an auxiliary Relighting LoRA module to enhance character integration with new environments, achieving natural lighting and color matching
Showcasing Wan-Animate's exceptional performance in various scenarios