
Search results
Format results
-
-
Towards a “Theoretical Minimum” for Physicists in AI
Yonatan Kahn Princeton University
PIRSA:25040089 -
Creativity by Compositionality in Generative Diffusion Models
Alessandro Favero École Polytechnique Fédérale de Lausanne
PIRSA:25040088 -
Causal Inference Meets Quantum Physics
Robert Spekkens Perimeter Institute for Theoretical Physics
PIRSA:25040086 -
-
-
Lecture - AdS/CFT, PHYS 777
David Kubiznak Charles University
-
-
LitLLMs, LLMs for Literature Review: Are We There Yet?
Gaurav Sahu Mila - Quebec Artificial Intelligence Institute
PIRSA:25040076 -
Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2
Yuri Chervonyi Deep Mind
PIRSA:25040075 -
Human Level AI by 2030
Jared Kaplan Johns Hopkins University - Department of Physics & Astronomy
PIRSA:25040074 -
-
-
Towards a “Theoretical Minimum” for Physicists in AI
Yonatan Kahn Princeton University
PIRSA:25040089As progress in AI hurtles forward at a speed seldom seen in the history of science, theorists who wish to gain a first-principles understanding of AI can be overwhelmed by the enormous number of papers, notational choices, and assumptions in the literature. I will make a pitch for developing a “Theoretical Minimum” for theoretical physicists aiming to study AI, with the goal of getting members of our community up to speed as quickly as possible with a suite of standard results whose validity can be checked by numerical experiments requiring only modest compute. In particular, this will require close collaboration between statistical physics, condensed matter physics, and high-energy physics, three communities that all have important perspectives to bring to the table but whose notation must be harmonized in order to be accessible to new researchers. I will focus my discussion on (a) the various approaches to the infinite-width limit, which seems like the best entry point for theoretical physicists who first encounter neural networks, and (b) the need for benchmark datasets from physics complex enough to capture aspects of natural-language data but which are nonetheless “calculable” from first-principles using tools of theoretical physics. -
Creativity by Compositionality in Generative Diffusion Models
Alessandro Favero École Polytechnique Fédérale de Lausanne
PIRSA:25040088Diffusion models have shown remarkable success in generating high-dimensional data such as images and language – a feat only possible if data has strong underlying structure. Understanding deep generative models thus requires understanding the structure of the data they learn from. In particular, natural data is often composed of features organized hierarchically. In this talk, we will model this structure using probabilistic context-free grammars – tree-like generative models from linguistics. I will present a theory of denoising diffusion on this data, predicting a phase transition that governs the reconstruction of features at various hierarchical levels. I will show empirical evidence for it in both image and language diffusion models. I will then discuss how diffusion models learn these grammars, revealing a quantitative relationship between data correlations and the training set size needed to learn how to hierarchically compose new data. In particular, we predict a polynomial scaling of sample complexity with data dimension, providing a mechanism by which diffusion models avoid the curse of dimensionality. Additionally, this theory predicts that models trained on limited data generate outputs that are locally coherent but lack global consistency, an effect empirically confirmed across modalities. These results offer a new perspective on how generative models learn to become creative and compose novel data by progressively uncovering the latent hierarchical structure. -
Causal Inference Meets Quantum Physics
Robert Spekkens Perimeter Institute for Theoretical Physics
PIRSA:25040086Can the effectiveness of a medical treatment be determined without the expense of a randomized controlled trial? Can the impact of a new policy be disentangled from other factors that happen to vary at the same time? Questions such as these are the purview of the field of causal inference, a general-purpose science of cause and effect, applicable in domains ranging from epidemiology to economics. Researchers in this field seek in particular to find techniques for extracting causal conclusions from statistical data. Meanwhile, one of the most significant results in the foundations of quantum theory—Bell's theorem—can also be understood as an attempt to disentangle correlation and causation. Recently, it has been recognized that Bell's result is an early foray into the field of causal inference and that the insights derived from 60 years of research on his theorem can supplement and improve upon state-of-the-art causal inference techniques. In the other direction, the conceptual framework developed by causal inference researchers provides a fruitful new perspective on what could possibly count as a satisfactory causal explanation of the quantum correlations observed in Bell experiments. Efforts to elaborate upon these connections have led to an exciting flow of techniques and insights across the disciplinary divide. This talk will highlight some of what is happening at the intersection of these two fields. -
-
Scaling Limits for Learning: Dynamics and Statics
Blake Bordelon Harvard University
PIRSA:25040085In this talk, I will discuss how physics can help improve our understanding of deep learning systems and guide improvements to their scaling strategies. I will first discuss mathematical results based on mean-field techniques from statistical physics to analyze the feature learning dynamics of neural networks as well as posteriors of large Bayesian neural networks. This theory will provide insights to develop initialization and optimization schemes for neural networks that admit well defined infinite width and depth limits and behave consistently across model scales, providing practical advantages. These limits also enable a theoretical characterization of the types of learned solutions reached by deep networks, and provide a starting point to characterize generalization and neural scaling laws (see Cengiz Pehlevan's talk). -
Lecture - AdS/CFT, PHYS 777
David Kubiznak Charles University
-
-
LitLLMs, LLMs for Literature Review: Are We There Yet?
Gaurav Sahu Mila - Quebec Artificial Intelligence Institute
PIRSA:25040076Literature reviews are an essential component of scientific research, but they remain time-intensive and challenging to write, especially due to the recent influx of research papers. In this talk, we will explore the zero-shot abilities of recent Large Language Models (LLMs) in assisting with the writing of literature reviews based on an abstract. We will decompose the task into two components: 1. Retrieving related works given a query abstract, and 2. Writing a literature review based on the retrieved results. We will then analyze how effective LLMs are for both components. For retrieval, we will discuss a novel two-step search strategy that first uses an LLM to extract meaningful keywords from the abstract of a paper and then retrieves potentially relevant papers by querying an external knowledge base. Additionally, we will study a prompting-based re-ranking mechanism with attribution and show that re-ranking doubles the normalized recall compared to naive search methods, while providing insights into the LLM's decision-making process. We will then discuss the two-step generation phase that first outlines a plan for the review and then executes steps in the plan to generate the actual review. To evaluate different LLM-based literature review methods, we create test sets from arXiv papers using a protocol designed for rolling use with newly released LLMs to avoid test set contamination in zero-shot evaluations. We will also see a quick demo of LitLLM in action towards the end. -
Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2
Yuri Chervonyi Deep Mind
PIRSA:25040075We present AlphaGeometry2, a significantly improved version of AlphaGeometry introduced in Trinh et al. (2024), which has now surpassed an average gold medalist in solving Olympiad geometry problems. To achieve this, we first extend the original AlphaGeometry language to tackle harder problems involving movements of objects, and problems containing linear equations of angles, ratios, and distances. This, together with support for non-constructive problems, has markedly improved the coverage rate of the AlphaGeometry language on International Math Olympiads (IMO) 2000-2024 geometry problems from 66% to 88%. The search process of AlphaGeometry2 has also been greatly improved through the use of Gemini architecture for better language modeling, and a novel knowledge-sharing mechanism that enables effective communication between search trees. Together with further enhancements to the symbolic engine and synthetic data generation, we have significantly boosted the overall solving rate of AlphaGeometry2 to 84% for all geometry problems over the last 25 years, compared to 54% previously. AlphaGeometry2 was also part of the system that achieved silver-medal standard at IMO 2024 this https URL. Last but not least, we report progress towards using AlphaGeometry2 as a part of a fully automated system that reliably solves geometry problems directly from natural language input. -
Human Level AI by 2030
Jared Kaplan Johns Hopkins University - Department of Physics & Astronomy
PIRSA:25040074 -
Bound on the dynamical exponent of frustration-free Hamiltonians and Markov processes
Tomohiro Soejima Harvard University
Exactly solvable models have tremendously helped our understanding of condensed matter systems. A notable number of them are "frustration-free" in the sense that all local terms of the Hamiltonian can be minimized simultaneously. It has been particularly successful at describing the physics of gapped phases of matter, such as symmetry protected topological phases and topologically ordered phases. On the other hand, relatively little has been understood about gapless frustration-free Hamiltonians, and their ability to teach us about more generic systems. In this talk, we derive a constraint on the spectrum of frustration-free Hamiltonians. Their dynamical exponent z, which captures the scaling of the energy gap versus the system size, is bounded from below to be z >= 2. This proves that frustration-free Hamiltonians are incapable of describing conformal critical points with z = 1. Further, by a well-known mapping from Markov processes to frustration-free Hamiltonians, we show that the relaxation time for many Markov processes also scale with z >=2. This improves the previously known bound on the relaxation time scaling of z >= 7/4. The talk is based on works with Rintaro Masaoka and Haruki Watanabe.