PIRSA:25040088

Creativity by Compositionality in Generative Diffusion Models

APA

Favero, A. (2025). Creativity by Compositionality in Generative Diffusion Models. Perimeter Institute. https://pirsa.org/25040088

MLA

Favero, Alessandro. Creativity by Compositionality in Generative Diffusion Models. Perimeter Institute, Apr. 09, 2025, https://pirsa.org/25040088

BibTex

          @misc{ pirsa_PIRSA:25040088,
            doi = {10.48660/25040088},
            url = {https://pirsa.org/25040088},
            author = {Favero, Alessandro},
            keywords = {},
            language = {en},
            title = {Creativity by Compositionality in Generative Diffusion Models},
            publisher = {Perimeter Institute},
            year = {2025},
            month = {apr},
            note = {PIRSA:25040088 see, \url{https://pirsa.org}}
          }
          

Alessandro Favero École Polytechnique Fédérale de Lausanne

Talk numberPIRSA:25040088
Talk Type Conference

Abstract

Diffusion models have shown remarkable success in generating high-dimensional data such as images and language – a feat only possible if data has strong underlying structure. Understanding deep generative models thus requires understanding the structure of the data they learn from. In particular, natural data is often composed of features organized hierarchically. In this talk, we will model this structure using probabilistic context-free grammars – tree-like generative models from linguistics. I will present a theory of denoising diffusion on this data, predicting a phase transition that governs the reconstruction of features at various hierarchical levels. I will show empirical evidence for it in both image and language diffusion models. I will then discuss how diffusion models learn these grammars, revealing a quantitative relationship between data correlations and the training set size needed to learn how to hierarchically compose new data. In particular, we predict a polynomial scaling of sample complexity with data dimension, providing a mechanism by which diffusion models avoid the curse of dimensionality. Additionally, this theory predicts that models trained on limited data generate outputs that are locally coherent but lack global consistency, an effect empirically confirmed across modalities. These results offer a new perspective on how generative models learn to become creative and compose novel data by progressively uncovering the latent hierarchical structure.