Face Swap
A deepfake technique where a machine-learning model replaces one person's face in a video or image with another person's face, preserving the original head pose, lighting, and expression.
Face swapping is the technique that originally gave deepfakes their name. In late 2017, a Reddit user released code that combined a pair of autoencoders to swap faces between videos. Eight years later, it remains the most common form of video deepfake attack in the wild.
How it works
A face-swap pipeline, at a high level:
- Face detection and alignment — find the face in each frame, canonicalize its position and orientation.
- Encoding — an encoder network compresses the face into a latent representation of identity, expression, and pose.
- Decoding with target identity — a decoder trained on the target person's face produces a rendering with the target's identity but the source's expression and pose.
- Blending — the generated face is composited back into the original frame, with boundary smoothing to hide the seam.
Early pipelines (DeepFaceLab) required hundreds of target-identity reference images and hours of training. Modern pipelines (FaceFusion, Roop, etc.) work with a single target photo and run in near-real-time on consumer hardware.
Detection implications
Face swaps leave characteristic artifacts that detectors learn to flag:
- Temporal flicker. Per-frame operation means identity "wobbles" slightly between frames — too little to notice consciously, enough for a model to catch.
- Blending boundary. The ring where the swapped face meets the original head often has faint color or texture discontinuities.
- Pose cliff. Face-swap models trained on front-facing reference imagery degrade when the head turns past ~45°. Attackers avoid profile shots.
- Mismatched lighting. The decoded face may carry its own implicit light source, producing shadows that don't match the scene.
See how to detect deepfake videos for the full dual-track detection workflow, which pairs face-swap detection with audio analysis to catch combined attacks.
Related techniques
- Lip-sync deepfake — real video with new audio and a re-generated mouth region. Visual fidelity is usually better than face swap but only the mouth changes.
- Reenactment — real face, AI-driven expressions and head pose. Lets an attacker puppeteer a known person's appearance.
- Full face generation — the entire face is generated from scratch, typically with a GAN or diffusion model. Used for synthetic identities rather than impersonation.