view article Article Fixing Open LLM Leaderboard with Math-Verify By hynky and 3 others • Feb 14 • 27
view article Article Open-source DeepResearch – Freeing our search agents By m-ric and 4 others • Feb 4 • 1.18k
view article Article CO₂ Emissions and Models Performance: Insights from the Open LLM Leaderboard By alozowski and 3 others • Jan 9 • 21
view article Article Rethinking LLM Evaluation with 3C3H: AraGen Benchmark and Leaderboard By alielfilali01 and 4 others • Dec 4, 2024 • 34
view article Article Letting Large Models Debate: The First Multilingual LLM Debate Competition By xuanricheng and 11 others • Nov 20, 2024 • 30
view article Article Judge Arena: Benchmarking LLMs as Evaluators By kaikaidai and 7 others • Nov 19, 2024 • 56
view article Article Introducing the Open FinLLM Leaderboard By QianqianXie1994 and 12 others • Oct 4, 2024 • 76
view article Article BigCodeBench: Benchmarking Large Language Models on Solving Practical and Challenging Programming Tasks By terryyz and 8 others • Jun 18, 2024 • 46
view article Article Falcon 2: An 11B parameter pretrained language model and VLM, trained on over 5000B tokens tokens and 11 languages By Quent-01 and 9 others • May 24, 2024 • 25
view article Article CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models By r34p3r1321 and 15 others • May 24, 2024 • 21
view article Article Introducing the Open Arabic LLM Leaderboard By alielfilali01 and 4 others • May 14, 2024 • 85
view article Article Introducing the Open Leaderboard for Hebrew LLMs! By Shaltiel and 3 others • May 5, 2024 • 38
view article Article Bringing the Artificial Analysis LLM Performance Leaderboard to Hugging Face By mhillsmith and 2 others • May 3, 2024 • 13
view article Article Improving Prompt Consistency with Structured Generations By willkurt and 2 others • Apr 30, 2024 • 62
view article Article Introducing the Open Chain of Thought Leaderboard By ggbetz and 3 others • Apr 23, 2024 • 30
view article Article The Open Medical-LLM Leaderboard: Benchmarking Large Language Models in Healthcare By aaditya and 2 others • Apr 19, 2024 • 140
view article Article Introducing the LiveCodeBench Leaderboard - Holistic and Contamination-Free Evaluation of Code LLMs By StringChaos and 6 others • Apr 16, 2024 • 15