PIRSA:25040061

State of AI Reasoning for Theoretical Physics - Insights from the TPBench Project

APA

Munchmeyer, M. (2025). State of AI Reasoning for Theoretical Physics - Insights from the TPBench Project. Perimeter Institute. https://pirsa.org/25040061

MLA

Munchmeyer, Moritz. State of AI Reasoning for Theoretical Physics - Insights from the TPBench Project. Perimeter Institute, Apr. 08, 2025, https://pirsa.org/25040061

BibTex

          @misc{ pirsa_PIRSA:25040061,
            doi = {10.48660/25040061},
            url = {https://pirsa.org/25040061},
            author = {Munchmeyer, Moritz},
            keywords = {},
            language = {en},
            title = {State of AI Reasoning for Theoretical Physics - Insights from the TPBench Project},
            publisher = {Perimeter Institute},
            year = {2025},
            month = {apr},
            note = {PIRSA:25040061 see, \url{https://pirsa.org}}
          }
          

Moritz Munchmeyer University of Wisconsin–Madison

Talk numberPIRSA:25040061
Talk Type Conference

Abstract

The newest large-language reasoning models are for the first time powerful enough to perform mathematical reasoning in theoretical physics at graduate level. In the mathematics community, data sets such as FrontierMath are being used to drive progress and evaluate models, but theoretical physics has so far received less attention. In this talk I will present our dataset TPBench (arxiv:2502.15815, tpbench.org), which was constructed to benchmark and improve AI models specifically for theoretical physics. We find extremely rapid progress of models over the last months, but also significant challenges at research level difficulty. I will also briefly outline strategies to improve these models for theoretical physics.