AI Voice & Deepfake Audio Detector
Paste an audio URL or upload a file. Get a synthetic-vs-real verdict and see which generator family it came from — free, no login.
Drop your audio file here
MP3 · WAV · M4A · FLAC · max 50.0 MB
How our AI voice detector works
DETECT-3B Omni inspects the audio across three dimensions simultaneously. Spectral:vocoders produce subtle artifacts above 8 kHz that microphones don't replicate. Prosody: synthetic speech distributes stress and pause length more evenly than human speech. Generator fingerprinting: each major TTS family — ElevenLabs, PlayHT, Resemble, OpenAI TTS, Azure, Google — leaves a statistical signature in the output. We match against a library of known fingerprints to estimate which system produced the audio.
Resemble Intelligence, built on Gemini 3 Flash, translates the model's internal signals into plain-English reasoning on every result. Most detectors give you a number. We give you a number and the sentence that explains it.
What this detector catches
- Voice-cloned speech from commercial TTS (ElevenLabs, Resemble, PlayHT, OpenAI, Google, Azure, Amazon Polly, Hume, Cartesia)
- Zero-shot voice clones from short reference samples
- Full-synthesis AI speech with no reference voice
- Partial splicing — real speech with inserted synthetic segments
- Scam-call audio patterns (integrates with Resemble Signal's 20 fraud categories)
- Podcast and video-game voice-over clones
What it doesn't catch — and why that matters
No detector is 100%. We publish our accuracy numbers and failure modes openly:
- Heavily compressed audio (telephony codecs, old VoIP) degrades accuracy by ~8 percentage points.
- Live-mixed content — synthetic voice played through a speaker and recorded on a phone — is harder; we catch it ~85% of the time versus 97% on direct files.
- State-of-the-art research systems we haven't seen in the wild yet will slip through until we retrain.
If it matters (and with deepfakes, it usually does), treat the detector as one input alongside context, provenance, and human judgment.
Common use cases
- Journalists: verifying a viral voice note before publication.
- Fraud teams: screening suspicious inbound calls against known attack patterns.
- HR & talent: flagging AI-narrated applicant videos.
- Podcasters: confirming guest audio was really them.
- Researchers: baseline-testing synthetic-speech datasets.
Frequently asked questions
+Is the tool really free?
Yes. No signup, no credit card, no file limit on single uploads other than a 50 MB size cap. For bulk or API use, the same model is available via the Resemble AI API with a free monthly tier.
+Is my audio stored?
No. Files are processed in memory, deleted within 10 minutes, and never used for training.
+Which file formats are supported?
MP3, WAV, M4A, FLAC, OGG, and WebM. Max 50 MB, up to 10 minutes.
+How accurate is it?
DETECT-3B Omni posts 98% accuracy on our 2026 audio benchmark across 10 commercial TTS and voice-cloning families. Accuracy degrades ~8 percentage points on heavily compressed phone-codec audio.
+Can I call this from my app?
Yes — the same model is available as an API via resemble.ai with 50 free scans/month to start. See resemble.ai/detect.
+Can it detect my spouse’s voice on a scam call?
Yes — for speaker-identity attacks, add speaker ID via Resemble’s voice-ID API. The deepfake detector confirms the audio is synthetic; speaker ID confirms whose voice it was meant to impersonate.