CLEVER: A Curated Benchmark for Formally Verified Code Generation
Published in NeurIPS 2025 Datasets and Benchmarks Track, 2025
We introduce CLEVER, a high-quality, manually curated benchmark of 161 problems for end-to-end verified code generation in Lean.
Recommended citation: Amitayush Thakur, Jasper Lee, George Tsoukalas, Meghana Sistla, Matthew Zhao, Stefan Zetzsche, Greg Durrett, Yisong Yue, and Swarat Chaudhuri. CLEVER: A Curated Benchmark for Formally Verified Code Generation. In NeurIPS 2025 Datasets and Benchmarks Track, 2025. URL https://openreview.net/forum?id=IbOacMF5qd.
Download Paper
