Spaces:
Running
Running
| # Your leaderboard name | |
| TITLE = """<h1 align="center" id="space-title">U-MATH / μ-MATH leaderboard</h1>""" | |
| # What does your leaderboard evaluate? | |
| INTRODUCTION_TEXT = """ | |
| These datasets are designed to test the mathematical reasoning and meta-evaluation capabilities of Large Language Models (LLMs) on university-level problems. | |
| U-MATH provides a set of 1,100 university-level mathematical problems, while µ-MATH complements it with a meta-evaluation framework focusing on solution judgment with 1084 LLM solutions. | |
| """ | |
| # Which evaluations are you running? how can people reproduce what you have? | |
| LLM_BENCHMARKS_TEXT = """ | |
| This repository contains the official leaderboard code for the U-MATH and $\mu$-MATH benchmarks. These datasets are designed to test the mathematical reasoning and meta-evaluation capabilities of Large Language Models (LLMs) on university-level problems. | |
| ### Overview | |
| U-MATH provides a set of 1,100 university-level mathematical problems, while µ-MATH complements it with a meta-evaluation framework focusing on solution judgment with 1084 LLM solutions. | |
| * 📊 [U-MATH benchmark at Huggingface](https://huggingface.co/datasets/toloka/umath) | |
| * 🔎 [μ-MATH benchmark at Huggingface](https://huggingface.co/datasets/toloka/mumath) | |
| * 🗞️ [Paper](https://arxiv.org/abs/2412.03205) | |
| * 👾 [Evaluation Code at GitHub](https://github.com/Toloka/u-math/) | |
| ### Licensing Information | |
| * The contents of the μ-MATH's machine-generated `model_output` column are subject to the underlying LLMs' licensing terms. | |
| * Contents of all the other dataset U-MATH and μ-MATH fields, as well as the code, are available under the MIT license. | |
| """ | |
| CITATION_TEXT = r"""@misc{chernyshev2024umath, | |
| title={U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs}, | |
| author={Konstantin Chernyshev and Vitaliy Polshkov and Ekaterina Artemova and Alex Myasnikov and Vlad Stepanov and Alexei Miasnikov and Sergei Tilga}, | |
| year={2024}, | |
| eprint={2412.03205}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CL}, | |
| url={https://arxiv.org/abs/2412.03205}, | |
| }""" | |