PutnamBench: A Multilingual Competition-Mathematics Benchmark for Formal Theorem-Proving
Published in AI for Math Workshop @ ICML 2024, 2024
We present PutnamBench, a new multilingual evaluation benchmark for formal theorem-proving. PutnamBench consists of formalizations of problems sourced from the William Lowell Putnam Mathematical Competition, the premier undergraduate-level mathematics competition in North America. All the problem statements come with formalizations in Lean 4 and Isabelle; a substantial subset have Coq formalizations as well. PutnamBench consists of 1337 hand-written formalizations across the three proof assistants, and aims to benchmark the next generation of theorem-proving algorithms for competition mathematics. Proving the theorems requires significant problem-solving ability and proficiency in a broad range of topics taught in undergraduate mathematics courses. We evaluate several established neural and symbolic theorem provers using PutnamBench. These approaches can only solve a handful of the problems, establishing our benchmark as a difficult open challenge for research on formal theorem-proving. PutnamBench is available at https://github.com/trishullab/PutnamBench.
Recommended citation: George Tsoukalas, Jasper Lee, John Jennings, Jimmy Xin, Michelle Ding, Michael Jennings, Amitayush Thakur, and Swarat Chaudhuri. Putnambench: A multilingual competition-mathematics benchmark for formal theorem-proving. In AI for Math Workshop @ ICML 2024, 2024b. URL https://openreview.net/forum?id=vqW1VRFeVP.
Download Paper