What Is a Deepfake? A Clear Definition
Deepfakes are AI-generated or AI-manipulated media — audio, images, or video — made to look or sound like something that didn't happen. A plain-language primer with examples.
A deepfake is AI-generated or AI-manipulated media — audio, image, or video — that portrays something that didn't happen, said by someone who didn't say it, or depicting someone who doesn't exist. The word combines deep learning (the kind of AI used to make them) and fake.
The term was coined in late 2017 by a Reddit user who used open-source machine-learning code to swap faces in videos. The technology has since become a multi-billion-dollar concern spanning fraud, disinformation, non-consensual imagery, and national security.
The three modalities
Deepfakes come in three main flavors, each with its own detection approach:
-
Audio deepfakes — synthetic voices made by voice cloning, text-to-speech, or voice-conversion systems. The voice sounds like a real person but was generated, often from just 30 seconds of reference audio. Used in CEO-fraud calls, fake voicemails, and consent attacks.
-
Image deepfakes — photos generated from text prompts by diffusion models (Stable Diffusion, Midjourney, DALL·E), or real photos modified via face swaps and inpainting. Used for non-consensual intimate imagery, disinformation, and synthetic identity documents.
-
Video deepfakes — the original meaning of the term. A real video with someone else's face, voice, or both substituted. Four sub-types: face swap, lip-sync, reenactment, and fully-synthetic video. See how to detect video deepfakes for details.
What isn't a deepfake
Scope matters. A few things the word is sometimes stretched to cover that probably shouldn't count:
- Filters and beauty apps. Rule-based image transformations aren't usually called deepfakes.
- CGI and visual effects. Movie VFX has used composited and synthetic faces for decades. The distinguishing feature of a deepfake is AI-driven automation at low cost.
- AI-generated art presented as art. An AI-generated portrait labeled as such for a magazine cover is synthetic media, but it's not a deepfake — there's no deceptive intent.
The common thread in "deepfake" is intent to deceive. Synthetic media without deception is just content.
Why they're a problem
Three forces converged:
- Quality went up. Face swaps in 2017 looked obviously wrong. In 2026 they fool most viewers most of the time.
- Cost went down. Making a convincing voice clone now takes ~30 seconds of reference audio and a few dollars of compute. Video deepfakes are more expensive but falling fast.
- Scale went up. Diffusion and voice-cloning pipelines run at scale, so attackers can probe thousands of targets concurrently.
This matters differently depending on the sector. Banking faces CEO-fraud calls; elections face narrative manipulation; insurance faces synthetic claim evidence. The threat model isn't universal.
What detection does
Detection models look at the statistical fingerprints that generation pipelines leave behind — frequency artifacts in images, phase inconsistencies in audio, temporal flickers in video — and estimate the probability that a piece of media came from a synthesis pipeline rather than a camera or microphone.
No detector is 100%. No detector can prove something is "real" in the strict sense — only that its fingerprint doesn't match known synthesis methods. The right use is as a probabilistic signal in a workflow that includes provenance, source context, and human review.
Try detection on a real file: audio, image, or video. Free, no signup.
Or build detection into your product directly. 50 free scans a month via the Resemble AI API.