
Format results
-
-
Aspects of RG flows and Bayesian Updating
David Berman Queen Mary - University of London (QMUL)
PIRSA:25040108 -
Renormalization Group Flows: from Optimal Transport to Diffusion Models
Jordan Cotler Harvard University
PIRSA:25040095 -
Statistical physics of learning with two-layer neural networks
Bruno Loureiro École Normale Supérieure - PSL
PIRSA:25040093 -
Architectural bias in a transport-based generative model : an asymptotic perspective
Hugo Cui Harvard University
PIRSA:25040092 -
Solvable models of scaling and emergence in deep learning
Cengiz Pehlevan Harvard University
PIRSA:25040091 -
Towards a “Theoretical Minimum” for Physicists in AI
Yonatan Kahn Princeton University
PIRSA:25040089 -
Creativity by Compositionality in Generative Diffusion Models
Alessandro Favero École Polytechnique Fédérale de Lausanne
PIRSA:25040088 -
Causal Inference Meets Quantum Physics
Robert Spekkens Perimeter Institute for Theoretical Physics
PIRSA:25040086 -
-
-
LitLLMs, LLMs for Literature Review: Are We There Yet?
Gaurav Sahu Mila - Quebec Artificial Intelligence Institute
PIRSA:25040076
-
Explainable AI in (Astro)physics
Luisa Lucie-Smith Universität Hamburg
PIRSA:25040098Machine learning has significantly improved the way scientists model and interpret large datasets across a broad range of the physical sciences; yet, its "black box" nature often limits our ability to trust and understand its results. Interpretable and explainable AI is ultimately required to realize the potential of machine-assisted scientific discovery. I will review efforts toward explainable AI focusing in particular in applications within the field of Astrophysics. I will present an explainable deep learning framework which combines model compression and information theory to achieve explainability. I will demonstrate its relevance to cosmological large-scale structures, such as dark matter halos and galaxies, as well as the cosmic microwave background, revealing new physical insights derived from these explainable AI models. -
Aspects of RG flows and Bayesian Updating
David Berman Queen Mary - University of London (QMUL)
PIRSA:25040108We will examine the idea of Bayesian updating as an inverse diffusion like process and its relation to the exact renormalisation group. In particular we will look at the role of Fisher Information, its metric and possible physical interpretations. -
Renormalization Group Flows: from Optimal Transport to Diffusion Models
Jordan Cotler Harvard University
PIRSA:25040095We show that Polchinski’s equation for exact renormalization group flow is equivalent to the optimal transport gradient flow of a field-theoretic relative entropy. This gives a surprising information-theoretic formulation of the exact renormalization group, expressed in the language of optimal transport. We will provide reviews of both the exact renormalization group, as well as the theory of optimal transportation. Our techniques generalize to other RG flow equations beyond Polchinski's. Moreover, we establish a connection between this more general class of RG flows and stochastic Langevin PDEs, enabling us to construct ML-based adaptive bridge samplers for lattice field theories. Finally, we will discuss forthcoming work on related methods to variationally approximate ground states of quantum field theories. -
Statistical physics of learning with two-layer neural networks
Bruno Loureiro École Normale Supérieure - PSL
PIRSA:25040093Feature learning - or the capacity of neural networks to adapt to the data during training - is often quoted as one of the fundamental reasons behind their unreasonable effectiveness. Yet, making mathematical sense of this seemingly clear intuition is still a largely open question. In this talk, I will discuss a simple setting where we can precisely characterise how features are learned by a two-layer neural network during the very first few steps of training, and how these features are essential for the network to efficiently generalise under limited availability of data. -
Architectural bias in a transport-based generative model : an asymptotic perspective
Hugo Cui Harvard University
PIRSA:25040092We consider the problem of learning a generative model parametrized by a two-layer auto-encoder, and trained with online stochastic gradient descent, to sample from a high-dimensional data distribution with an underlying low-dimensional structure. We provide a tight asymptotic characterization of low-dimensional projections of the resulting generated density, and evidence how mode(l) collapse can arise. On the other hand, we discuss how in a case where the architectural bias is suited to the target density, these simple models can efficiently learn to sample from a binary Gaussian mixture target distribution. -
Solvable models of scaling and emergence in deep learning
Cengiz Pehlevan Harvard University
PIRSA:25040091 -
Towards a “Theoretical Minimum” for Physicists in AI
Yonatan Kahn Princeton University
PIRSA:25040089As progress in AI hurtles forward at a speed seldom seen in the history of science, theorists who wish to gain a first-principles understanding of AI can be overwhelmed by the enormous number of papers, notational choices, and assumptions in the literature. I will make a pitch for developing a “Theoretical Minimum” for theoretical physicists aiming to study AI, with the goal of getting members of our community up to speed as quickly as possible with a suite of standard results whose validity can be checked by numerical experiments requiring only modest compute. In particular, this will require close collaboration between statistical physics, condensed matter physics, and high-energy physics, three communities that all have important perspectives to bring to the table but whose notation must be harmonized in order to be accessible to new researchers. I will focus my discussion on (a) the various approaches to the infinite-width limit, which seems like the best entry point for theoretical physicists who first encounter neural networks, and (b) the need for benchmark datasets from physics complex enough to capture aspects of natural-language data but which are nonetheless “calculable” from first-principles using tools of theoretical physics. -
Creativity by Compositionality in Generative Diffusion Models
Alessandro Favero École Polytechnique Fédérale de Lausanne
PIRSA:25040088Diffusion models have shown remarkable success in generating high-dimensional data such as images and language – a feat only possible if data has strong underlying structure. Understanding deep generative models thus requires understanding the structure of the data they learn from. In particular, natural data is often composed of features organized hierarchically. In this talk, we will model this structure using probabilistic context-free grammars – tree-like generative models from linguistics. I will present a theory of denoising diffusion on this data, predicting a phase transition that governs the reconstruction of features at various hierarchical levels. I will show empirical evidence for it in both image and language diffusion models. I will then discuss how diffusion models learn these grammars, revealing a quantitative relationship between data correlations and the training set size needed to learn how to hierarchically compose new data. In particular, we predict a polynomial scaling of sample complexity with data dimension, providing a mechanism by which diffusion models avoid the curse of dimensionality. Additionally, this theory predicts that models trained on limited data generate outputs that are locally coherent but lack global consistency, an effect empirically confirmed across modalities. These results offer a new perspective on how generative models learn to become creative and compose novel data by progressively uncovering the latent hierarchical structure. -
Causal Inference Meets Quantum Physics
Robert Spekkens Perimeter Institute for Theoretical Physics
PIRSA:25040086Can the effectiveness of a medical treatment be determined without the expense of a randomized controlled trial? Can the impact of a new policy be disentangled from other factors that happen to vary at the same time? Questions such as these are the purview of the field of causal inference, a general-purpose science of cause and effect, applicable in domains ranging from epidemiology to economics. Researchers in this field seek in particular to find techniques for extracting causal conclusions from statistical data. Meanwhile, one of the most significant results in the foundations of quantum theory—Bell's theorem—can also be understood as an attempt to disentangle correlation and causation. Recently, it has been recognized that Bell's result is an early foray into the field of causal inference and that the insights derived from 60 years of research on his theorem can supplement and improve upon state-of-the-art causal inference techniques. In the other direction, the conceptual framework developed by causal inference researchers provides a fruitful new perspective on what could possibly count as a satisfactory causal explanation of the quantum correlations observed in Bell experiments. Efforts to elaborate upon these connections have led to an exciting flow of techniques and insights across the disciplinary divide. This talk will highlight some of what is happening at the intersection of these two fields. -
Scaling Limits for Learning: Dynamics and Statics
Blake Bordelon Harvard University
PIRSA:25040085In this talk, I will discuss how physics can help improve our understanding of deep learning systems and guide improvements to their scaling strategies. I will first discuss mathematical results based on mean-field techniques from statistical physics to analyze the feature learning dynamics of neural networks as well as posteriors of large Bayesian neural networks. This theory will provide insights to develop initialization and optimization schemes for neural networks that admit well defined infinite width and depth limits and behave consistently across model scales, providing practical advantages. These limits also enable a theoretical characterization of the types of learned solutions reached by deep networks, and provide a starting point to characterize generalization and neural scaling laws (see Cengiz Pehlevan's talk). -
-
LitLLMs, LLMs for Literature Review: Are We There Yet?
Gaurav Sahu Mila - Quebec Artificial Intelligence Institute
PIRSA:25040076Literature reviews are an essential component of scientific research, but they remain time-intensive and challenging to write, especially due to the recent influx of research papers. In this talk, we will explore the zero-shot abilities of recent Large Language Models (LLMs) in assisting with the writing of literature reviews based on an abstract. We will decompose the task into two components: 1. Retrieving related works given a query abstract, and 2. Writing a literature review based on the retrieved results. We will then analyze how effective LLMs are for both components. For retrieval, we will discuss a novel two-step search strategy that first uses an LLM to extract meaningful keywords from the abstract of a paper and then retrieves potentially relevant papers by querying an external knowledge base. Additionally, we will study a prompting-based re-ranking mechanism with attribution and show that re-ranking doubles the normalized recall compared to naive search methods, while providing insights into the LLM's decision-making process. We will then discuss the two-step generation phase that first outlines a plan for the review and then executes steps in the plan to generate the actual review. To evaluate different LLM-based literature review methods, we create test sets from arXiv papers using a protocol designed for rolling use with newly released LLMs to avoid test set contamination in zero-shot evaluations. We will also see a quick demo of LitLLM in action towards the end.