Deepfake Detection for Call Centers and Contact Centers
How contact centers catch AI-cloned callers in real time — the integration patterns, thresholds, and organizational policies that actually work against deepfake vishing.
- $2.9B
- annual US business losses attributed to vishing and CEO fraud
- +245%
- increase in deepfake-assisted fraud attempts (2024 YoY)
- <300ms
- Resemble AI detection latency on live audio
Call centers are the highest-volume surface for voice-deepfake attacks in 2026. Every large bank, telco, insurance carrier, and government helpline is fielding a growing share of cloned-voice callers attempting account takeover, password resets, fraudulent claim initiation, and executive impersonation.
The good news: the detection problem is tractable if you accept that detection is one layer inside a broader voice-channel defense stack.
The attack surface
Contact centers get hit in four primary ways:
- Account-takeover (ATO) via cloned voice. Attacker clones the customer's voice from social-media audio or a breach, calls customer service, passes voice-biometrics authentication, resets password, drains account.
- Executive impersonation. Attacker calls finance or treasury with cloned CEO/CFO audio, requests a wire transfer. See Arup and Ferrari.
- IVR fraud. Automated cloned-voice bots attempt credential bruteforcing or harvest information through interactive voice response trees.
- Social-engineering + cloning combinations. Multi-turn vishing calls where a human fraudster uses a cloned voice on specific high-trust phrases.
The detection pipeline
A mature call-center defense stack runs three parallel signals on every inbound call of sufficient risk:
- Voice biometrics — does the caller's voice match the enrolled customer? (Answers who.)
- Deepfake detection — is the audio synthetic regardless of whose voice it claims to be? (Answers is this real?)
- Call-metadata fraud scoring — spoofing indicators, device reputation, call-path anomalies. (Answers did this call originate where it claims?)
Resemble AI's DETECT-3B covers the middle layer. Pair it with a voice-biometrics vendor (your IVR platform likely has one, or Pindrop for purpose-built coverage) and call-metadata fraud scoring (Pindrop, Neustar, or your telco).
Integration patterns
Two patterns predominate:
Pattern 1 — Pre-agent triage
Detection runs during IVR or queue hold. High-risk callers are flagged before they reach a human agent; the agent sees a "deepfake-risk: high" indicator on their screen when the call connects. This is the most common 2026 deployment.
- Latency: tolerant, since the detection runs during queue wait.
- Threshold: agent-visible warning at 0.7+, automatic escalation (supervisor loops in) at 0.9+.
- Integration: plugs into the contact-center platform (Genesys, Amazon Connect, NICE) as a side-channel signal, not a gate.
Pattern 2 — Mid-call streaming
Detection runs on a 3-second sliding window of live call audio, with per-segment flags surfaced to the agent in real time. Used when pre-agent triage isn't possible (e.g., direct-dial numbers that bypass IVR) or for high-value transactions.
- Latency: must be sub-second per window. DETECT-3B's 300ms median fits.
- Threshold: in-call "advise" prompts at 0.7+; hard-stop on authorization actions at 0.9+.
- Integration: streaming-audio API consumption from the SBC or media server. More engineering, but higher confidence on sustained attacks.
Threshold tuning
Real-world contact-center thresholds are not the ones in model benchmarks. Three rules of thumb:
- Advise before interrupt. Launch the detector in a mode where it decorates agent UI but doesn't automatically block calls. Build operational track record before automating decisions.
- Per-call-type thresholds. A password reset has different risk than an account balance inquiry. Thresholds should scale with downstream authority.
- Track agent acceptance. If agents start ignoring the flag (alarm fatigue), the threshold is too low. Re-tune quarterly.
Organizational policies that matter
Technology alone isn't enough. The contact centers getting this right also run:
- Callback verification on any high-authority action, regardless of voice-biometric + deepfake-detect pass. See the WPP defense.
- Multi-party approval for transfers above firm-specific ceilings — nobody acts on voice alone.
- Explicit agent training on deepfake voice attacks. Most agents have heard of deepfake video; far fewer know about voice cloning.
- Incident playbooks for when a deepfake call does get through. Time-to-detect is the metric that matters once the preventive layer fails.