Provenance
The verifiable record of how a piece of media was created, edited, and distributed — who captured or generated it, with what tools, when, and what modifications were applied. In deepfake defense, strong provenance is evidence a file is authentic.
Provenance is the origin story of a file — a chain of evidence about who made it, with what tool, and what's been done to it since. In the context of synthetic-media detection, provenance is the positive defense: rather than asking "is this fake?", it asks "do we have attested evidence this is real?"
The layers of provenance
Capture-time signatures. A camera, microphone, or AI tool signs the file at the moment of creation. See C2PA for the industry standard.
Embedded statistical marks. Watermarking schemes embed a detectable signal in the content itself.
Platform metadata. Upload timestamps, source account, geolocation — weaker than cryptographic signing but useful context.
First-appearance searches. Reverse image or video searches showing when and where a file first appeared publicly.
Forensic analysis. Camera-sensor noise patterns, JPEG compression artifacts, and other fingerprints that identify specific capture equipment.
Why provenance is necessary but not sufficient
Provenance is extremely strong when present — a cryptographically signed photo from a Leica M11 is essentially unimpeachable. But:
- Most content in the wild has no provenance signals.
- Provenance can be stripped (metadata removed, file re-encoded) without becoming positively fake.
- Provenance attests to creation, not accuracy — a signed image can still depict something misleading.
Provenance + detection
A complete verification workflow uses provenance where available and falls back to deepfake detection where it isn't:
- Check for C2PA or watermark → if valid, treat as high-trust.
- If absent, run detection → probabilistic score.
- If detection is borderline, look for external signals: source account history, first-appearance timing, reverse search.