Glossary

Face Swap

Also: face swapping · face replacement · DeepFaceLab

A deepfake technique where a machine-learning model replaces one person's face in a video or image with another person's face, preserving the original head pose, lighting, and expression.

Face swapping is the technique that originally gave deepfakes their name. In late 2017, a Reddit user released code that combined a pair of autoencoders to swap faces between videos. Eight years later, it remains the most common form of video deepfake attack in the wild.

How it works

A face-swap pipeline, at a high level:

Face detection and alignment — find the face in each frame, canonicalize its position and orientation.
Encoding — an encoder network compresses the face into a latent representation of identity, expression, and pose.
Decoding with target identity — a decoder trained on the target person's face produces a rendering with the target's identity but the source's expression and pose.
Blending — the generated face is composited back into the original frame, with boundary smoothing to hide the seam.

Early pipelines (DeepFaceLab) required hundreds of target-identity reference images and hours of training. Modern pipelines (FaceFusion, Roop, etc.) work with a single target photo and run in near-real-time on consumer hardware.

Detection implications

Face swaps leave characteristic artifacts that detectors learn to flag:

Temporal flicker. Per-frame operation means identity "wobbles" slightly between frames — too little to notice consciously, enough for a model to catch.
Blending boundary. The ring where the swapped face meets the original head often has faint color or texture discontinuities.
Pose cliff. Face-swap models trained on front-facing reference imagery degrade when the head turns past ~45°. Attackers avoid profile shots.
Mismatched lighting. The decoded face may carry its own implicit light source, producing shadows that don't match the scene.

See how to detect deepfake videos for the full dual-track detection workflow, which pairs face-swap detection with audio analysis to catch combined attacks.

Lip-sync deepfake — real video with new audio and a re-generated mouth region. Visual fidelity is usually better than face swap but only the mouth changes.
Reenactment — real face, AI-driven expressions and head pose. Lets an attacker puppeteer a known person's appearance.
Full face generation — the entire face is generated from scratch, typically with a GAN or diffusion model. Used for synthetic identities rather than impersonation.

How it works

Detection implications

Related techniques

See also