Illustration of Video Head Swapping. Given a reference image and video sequence as input, our model can generate high-fidelity head swapping results that accommodate diverse hairstyles, expressions, and identities.
Seamless head transplantation from single reference image
Controllable facial expressions and movements
Preserve identity and background seamlessly
In this paper, we propose a novel diffusion-based multi-condition controllable framework for video head swapping, which seamlessly transplant a human head from a static image into a dynamic video, while preserving the original body and background of target video, and further allowing to tweak head expressions and movements during swapping as needed.
Existing face-swapping methods mainly focus on localized facial replacement neglecting holistic head morphology, while head-swapping approaches struggling with hairstyle diversity and complex backgrounds, and none of these methods allow users to modify the transplanted head expressions after swapping.
To tackle these challenges, our method incorporates several innovative strategies through a unified latent diffusion paradigm. Experimental results demonstrate that our method excels in seamless background integration while preserving the identity of the source portrait, as well as showcasing superior expression transfer capabilities applicable to both real and virtual characters.
The framework of our method. During the training stage, we begin by preprocessing the input video to obtain the inpainted background and detect 3D landmarks utilizing Mediapipe. Both the inpainted background and the 3D landmarks serve as conditional inputs, and a frame is randomly selected as the reference image.
Shape-agnostic mask strategy with hair enhancement for robust identity preservation across diverse hair types and complex backgrounds.
Disentangled 3D landmarks that decouple identity, expression, and head poses for precise expression control.
Advanced scaling strategy to minimize cross-identity expression distortion for higher transfer precision.
Unified diffusion model with additional ID losses in pixel level for enhanced identity consistency.
High-fidelity head swapping with natural expression transfer
Complex hairstyle preservation with seamless integration
Stylized character head swapping with artistic preservation
Diverse artistic style adaptation and expression transfer
Gender-specific features preservation with natural movement
Dynamic expression mapping with emotional fidelity
Superhero character head swapping with action preservation
Complex facial expression transfer with character integrity
Film-style head swapping with professional quality output
Real-time expression control and modification capabilities