Wan-Animate

Unified Character Animation and Replacement Technology

Generate high-fidelity character videos by precisely replicating expressions and actions from performer videos, or replace characters in videos with seamless environmental integration

Wan-Animate can add animations to any character based on performer videos, precisely replicating the performer's facial expressions and actions to generate highly realistic character videos

Wan-Animate can replace characters in videos with animated characters, preserving expressions and actions while replicating original lighting and color tones for seamless environmental integration

Abstract

We introduce Wan-Animate, a unified framework for character animation and character replacement. Given a character image and a reference video, Wan-Animate can add animations to the character by precisely replicating the expressions and actions of the character in the video, thereby generating high-fidelity character videos.

Alternatively, it can integrate animated characters into reference videos to replace original characters, replicating the scene's lighting and color tones for seamless environmental integration. Wan-Animate is built upon the Wan model. To adapt it for character animation tasks, we employ an improved input paradigm to distinguish reference conditions and generation regions.

This design unifies multiple tasks into a common symbolic representation. We use spatially aligned skeletal signals to replicate body movements and employ implicit facial features extracted from source images to reproduce expressions, thereby achieving character video generation with high controllability and expressiveness.

Furthermore, to enhance environmental integration during character replacement, we develop an auxiliary Relighting LoRA. This module applies appropriate environmental lighting and color tones while maintaining character appearance consistency. Experimental results demonstrate that Wan-Animate achieves state-of-the-art performance. We are committed to open-sourcing the model weights and their source code.

Method

Overview of Wan-Animate built on Wan-I2V, we modify its input formulation to unify reference image input, temporal frame guidance, and environmental information (dual-mode compatibility) under a common symbolic representation

Wan-Animate Architecture

Body Motion Control

Use spatially aligned skeletal signals to precisely replicate body movements, achieving natural and smooth character animation effects

Facial Expression Reproduction

Utilize implicit features extracted from face images as driving signals to achieve highly realistic facial expression reproduction and emotional transmission

Environmental Lighting Integration

Train an auxiliary Relighting LoRA module to enhance character integration with new environments, achieving natural lighting and color matching

Results

Showcasing Wan-Animate's exceptional performance in various scenarios

Expressive Human Animation

Generalizable Arbitrary Character Animation

Dynamic Motion and Camera

Character Replacement and Environmental Integration

Lighting and Color Replication

Qualitative Comparisons