Perceptual Hash
A hash function that produces similar outputs for perceptually similar media (near-identical images, videos, or audio). Unlike cryptographic hashes, which change drastically on any edit, perceptual hashes are stable across compression, minor cropping, and format changes — enabling fast "have we seen this before" lookups at scale.
A perceptual hash produces a short code from a media file such that perceptually similar files get similar codes. The classic example is Microsoft's PhotoDNA, used to detect known CSAM: when a new image is uploaded, its pHash is compared against a database of known-bad hashes, and close matches flag the file even if it's been cropped, re-encoded, or slightly edited.
How it differs from cryptographic hashing
- SHA-256 / MD5 — change any bit of the file, get a completely different hash. Useful for integrity checking.
- Perceptual hash — change the pixels slightly, get a nearly-identical hash. Useful for "have we seen this before?" matching.
Perceptual hash in deepfake detection
pHash plays a supporting role in the deepfake-defense stack:
- Known-fake databases. When a specific deepfake goes viral (a fabricated political clip, a non-consensual image), platforms can hash it and detect re-uploads at scale without running a full detection model on every file.
- Variant detection. Attackers often re-upload the same deepfake with minor cropping or filters. pHash catches the variants that exact-match would miss.
- Attribution support. If a file matches a known-bad hash, it saves a forensic team the expensive step of running detection end-to-end.
Limitations
- First-time fakes. pHash only finds content that's already known. It can't detect a never-seen-before deepfake — that's what deepfake detection is for.
- Adversarial editing. Determined attackers can modify files enough to break the perceptual-hash match while the file remains perceptually nearly identical.
- Database coverage. The system is only as good as the hashed content library.
In practice, pHash + active deepfake detection + provenance checks is the typical pipeline for large platforms.