detect·deepfakesby Resemble AI
Glossary

Lip-Sync Deepfake

Also: lip-sync attack · audio-driven lip sync · Wav2Lip

A deepfake technique where a real video is paired with new audio, and only the mouth region is re-generated to match the new speech — leaving the rest of the face, body, and scene untouched.

Lip-sync deepfakes are the cheapest, fastest, and most common form of video manipulation in 2026. Unlike face swaps, which replace the entire face, lip-sync only touches the mouth region — which makes them harder to detect visually and faster to produce.

The attack

The pipeline:

  1. Take a real video of a target.
  2. Generate new audio via voice cloning or record a voice-actor delivering the desired script.
  3. Use an audio-driven lip-sync model (Wav2Lip was the popular 2020 open-source option; better successors exist now) to regenerate the mouth region frame-by-frame, matching phonemes to mouth shapes.

The resulting video looks real for most viewers — everything except the mouth is real — and lets an attacker put words into the target's mouth without a full face-swap pipeline.

Why detection requires dual-track analysis

A visual-only deepfake detector often misses lip-sync attacks because:

  • The bulk of the frame is genuine.
  • Face-identity signals match (it's really the target's face).
  • Only the mouth region and its immediate boundary show regeneration artifacts.

This is the core reason our video detector runs audio and video analysis in parallel. A lip-sync attack typically passes the video-track check but fails the audio-track check — because the paired audio is cloned.

Detection signals

For the visual track:

  • Mouth-region high-frequency hash. The regenerated mouth carries the generation model's fingerprint.
  • Boundary artifacts. A faint blend ring where the regenerated mouth meets the rest of the face.
  • Phoneme-viseme mismatch. Subtle timing differences between audio phonemes and visual mouth shapes.
  • Teeth and tongue implausibility. Generated mouths sometimes render anatomically incorrect teeth positions or tongue placement.

See also