LLM4Math
updated
BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Paper
• 2510.04721
• Published
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language
Models
Paper
• 2505.02735
• Published
• 33
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
Paper
• 2504.18428
• Published
MathConstruct: Challenging LLM Reasoning with Constructive Proofs
Paper
• 2502.10197
• Published
TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts
Paper
• 2407.03203
• Published
• 12
Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad
Paper
• 2503.21934
• Published
• 1
Solving Inequality Proofs with Large Language Models
Paper
• 2506.07927
• Published
• 20
APOLLO: Automated LLM and Lean Collaboration for Advanced Formal
Reasoning
Paper
• 2505.05758
• Published
• 1
MathBench: Evaluating the Theory and Application Proficiency of LLMs
with a Hierarchical Mathematics Benchmark
Paper
• 2405.12209
• Published
ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark
Paper
• 2505.23851
• Published
Theorem Prover as a Judge for Synthetic Data Generation
Paper
• 2502.13137
• Published
• 1
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural
Language and Reinforcement Learning
Paper
• 2505.23754
• Published
• 15
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale
Synthetic Data
Paper
• 2405.14333
• Published
• 44
GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models
Paper
• 2511.11134
• Published
• 32