Synthetic data generation solves the privacy compliance problem but creates a new provenance black box. The process of creating data with Generative Adversarial Networks (GANs) or diffusion models is inherently inscrutable, making it impossible to trace a synthetic data point back to its causal, real-world origin.














