Latent Space
The compressed, abstract representation inside a generative model where content is encoded before being decoded back into concrete output (pixels, audio samples). Operations in latent space — interpolation, conditioning, manipulation — underlie most generative AI capabilities.
Latent space is the "inner language" of a generative model. Instead of working directly in pixels (for images) or samples (for audio), generative models first encode their inputs into a compressed representation, then operate on that — modifying, combining, or generating from points in the compressed space — before decoding back to the original form.
Why it matters for deepfakes
Latent-space operations are what make targeted manipulation possible:
- Attribute editing. Move a point in an image latent space along the "smile" direction to make a face smile without changing identity.
- Identity swap. Transfer the identity component of a latent representation while keeping expression and pose — the core trick behind face swaps.
- Style transfer. Separate content from style in the latent space and recombine.
- Conditional generation. Condition the latent on a text prompt ("a photo of Obama at a table") to steer the output.
Why it matters for detection
Generative models leave traces of the latent-space operations they performed:
- Upsampling artifacts appear when a small-dimension latent is decoded to a large-dimension image.
- Interpolation seams show up in outputs that blended multiple latent points.
- Conditioning leakage — the conditioning signal sometimes bleeds through as structured patterns the detector can learn.
Detectors don't explicitly analyze latent space; they detect the fingerprints that latent-space operations leave in the final output.