Video detection

AI Video & Deepfake Detector

Upload a video or paste a URL to check for face-swaps, lip-sync deepfakes, and fully-synthesized AI video. Get frame-level reasoning.

Drop your video here

MP4 · MOV · WebM · max 100.0 MB

The four classes of video deepfake

Not all video fakes are the same. Each class leaves different fingerprints and requires a different detection approach:

Face swap — another identity transplanted onto real footage. Most common attack. Detects via per-frame identity drift and boundary-blending artifacts.
Lip-sync — real video with new audio and a regenerated mouth region. Used for misattributed-quote attacks. Requires dual-track analysis to catch.
Reenactment — real face, attacker-driven expressions in real time. The Arup-style video-call attack. Hardest to detect live; easier after recording.
Full synthesis — everything generated (Sora, Runway, Veo). Fewer samples in the wild yet, but quality has climbed fast.

How dual-track analysis works

For every video you upload, we run the visual track through our frame-level detector (same DETECT-3B model, video head) and the audio track through the zero-shot audio detector in parallel. Results come back as two independent verdicts plus a combined recommendation. The combination logic:

Both tracks ≥ 0.8 real → almost certainly authentic.
Either track ≥ 0.7 fake → treat the video as compromised.
Both in the middle (0.4–0.6) → inconclusive, request a second source.
Video real, audio fake → most likely a lip-sync attack.
Video fake, audio real → most likely a face swap.

What this detector catches

Face-swap videos from DeepFaceLab, FaceFusion, Roop and commercial successors
Lip-sync attacks built on Wav2Lip and successors
Full-synthesis video from Sora, Runway Gen-3, Veo, Kling, Pika, Luma Dream Machine
Avatar and digital-twin video from HeyGen and Synthesia
Combined attacks: face swap + voice clone used in CEO-fraud video calls

Limitations

Clips heavily compressed by social platforms lose frame-level artifacts.
Very short clips (<3s) or mostly-static scenes limit the temporal signals we rely on.
Novel generation models released after our last retrain will initially slip through until the detector catches up.

Common use cases

Deposition and legal: verifying video evidence in litigation.
Elections: newsroom verification of politician clips before publication.
Insurance: video claims evidence (injuries, accident scenes).
Ransom and extortion: verifying “proof of life” videos sent to victims.
Content platforms: automated screening for non-consensual deepfake content.

Frequently asked questions

+What types of video deepfakes does this catch?

Face swaps (DeepFaceLab-family), lip-sync attacks (Wav2Lip and successors), reenactment (live-puppeted faces on video calls), and full-synthesis video from Sora, Runway, Veo, Kling, Pika, Luma.

+Why do you report audio and visual verdicts separately?

Because they can diverge. A lip-sync attack often pairs real video with cloned audio — the video track looks fine alone. A face-swap attack often pairs real audio with a generated face — the audio passes alone. Separate verdicts catch both.

+Does short clip length hurt accuracy?

Yes — clips under 3 seconds have significantly less temporal signal. We recommend 10–30 seconds for best confidence.

+What about heavy social-media compression?

TikTok- and Reels-grade compression destroys much of the frame-level signal. Analyze the highest-quality source you can find — original uploads, not screen recordings of social embeds.

+Can I run this at scale?

Yes — the API handles batch workloads with sub-second median latency per minute of video. See resemble.ai/detect for enterprise pricing.

Keep reading

Audio detector Image detector How to spot deepfake video Face swap Lip-sync vs. Reality Defender