AI Video & Deepfake Detector
Upload a video or paste a URL to check for face-swaps, lip-sync deepfakes, and fully-synthesized AI video. Get frame-level reasoning.
Drop your video here
MP4 · MOV · WebM · max 100.0 MB
The four classes of video deepfake
Not all video fakes are the same. Each class leaves different fingerprints and requires a different detection approach:
- Face swap — another identity transplanted onto real footage. Most common attack. Detects via per-frame identity drift and boundary-blending artifacts.
- Lip-sync — real video with new audio and a regenerated mouth region. Used for misattributed-quote attacks. Requires dual-track analysis to catch.
- Reenactment — real face, attacker-driven expressions in real time. The Arup-style video-call attack. Hardest to detect live; easier after recording.
- Full synthesis — everything generated (Sora, Runway, Veo). Fewer samples in the wild yet, but quality has climbed fast.
How dual-track analysis works
For every video you upload, we run the visual track through our frame-level detector (same DETECT-3B model, video head) and the audio track through the zero-shot audio detector in parallel. Results come back as two independent verdicts plus a combined recommendation. The combination logic:
- Both tracks ≥ 0.8 real → almost certainly authentic.
- Either track ≥ 0.7 fake → treat the video as compromised.
- Both in the middle (0.4–0.6) → inconclusive, request a second source.
- Video real, audio fake → most likely a lip-sync attack.
- Video fake, audio real → most likely a face swap.
What this detector catches
- Face-swap videos from DeepFaceLab, FaceFusion, Roop and commercial successors
- Lip-sync attacks built on Wav2Lip and successors
- Full-synthesis video from Sora, Runway Gen-3, Veo, Kling, Pika, Luma Dream Machine
- Avatar and digital-twin video from HeyGen and Synthesia
- Combined attacks: face swap + voice clone used in CEO-fraud video calls
Limitations
- Clips heavily compressed by social platforms lose frame-level artifacts.
- Very short clips (<3s) or mostly-static scenes limit the temporal signals we rely on.
- Novel generation models released after our last retrain will initially slip through until the detector catches up.
Common use cases
- Deposition and legal: verifying video evidence in litigation.
- Elections: newsroom verification of politician clips before publication.
- Insurance: video claims evidence (injuries, accident scenes).
- Ransom and extortion: verifying “proof of life” videos sent to victims.
- Content platforms: automated screening for non-consensual deepfake content.
Frequently asked questions
+What types of video deepfakes does this catch?
Face swaps (DeepFaceLab-family), lip-sync attacks (Wav2Lip and successors), reenactment (live-puppeted faces on video calls), and full-synthesis video from Sora, Runway, Veo, Kling, Pika, Luma.
+Why do you report audio and visual verdicts separately?
Because they can diverge. A lip-sync attack often pairs real video with cloned audio — the video track looks fine alone. A face-swap attack often pairs real audio with a generated face — the audio passes alone. Separate verdicts catch both.
+Does short clip length hurt accuracy?
Yes — clips under 3 seconds have significantly less temporal signal. We recommend 10–30 seconds for best confidence.
+What about heavy social-media compression?
TikTok- and Reels-grade compression destroys much of the frame-level signal. Analyze the highest-quality source you can find — original uploads, not screen recordings of social embeds.
+Can I run this at scale?
Yes — the API handles batch workloads with sub-second median latency per minute of video. See resemble.ai/detect for enterprise pricing.