Face Reenactment
A deepfake technique where a driver video (usually the attacker's own face) controls the expressions, head pose, and gaze of a target identity in a generated video — enabling live impersonation.
Face reenactment is the technique that makes real-time deepfake video calls possible. Where face swaps replace the face in a pre-existing video, reenactment drives a target face live — the attacker performs; the target's face follows.
How it works
- An identity encoder extracts the target's face from reference imagery (a single photo can be enough).
- A driving signal — the attacker's own face, captured on a webcam — provides expressions, head pose, and gaze in real time.
- A generator (usually a diffusion or GAN-based model) renders the target face performing the driver's movements.
The result: on a Zoom or Teams call, the attacker's camera shows the target's face, synchronized to the attacker's own speech and expressions.
The attack surface it opens
Real-time video call impersonation is the marquee threat. A cloned-voice TTS pipeline can pair with face reenactment to impersonate a known person on a video call — as happened in the Arup $25.6M case and several similar incidents since.
Detection signals
Reenactment tends to leave specific tells:
- Pose limit. Most models degrade when the driver turns the head past ~45°.
- Expression repertoire gaps. Extreme expressions (full laughs, scrunched concentration) can break the generator.
- Lighting carry-over. The generated face may not update its lighting as the driver moves through different room conditions.
- Per-frame identity drift. Small, high-frequency shifts in the rendered face identity frame-to-frame.
Liveness challenges that require unusual movements (turning full profile, covering part of the face, showing a specific object) are effective as a complement to deepfake detection.