Benchmarks - a hppdqdq Collection

hppdqdq 's Collections

Benchmarks

updated Jan 13

Running on CPU Upgrade

230

230

MMLU-Pro Leaderboard

🥇

More advanced and challenging multi-task evaluation
Running

53

53

Stick To Your Role! Leaderboard

🎭

Benchmarking LLMs on the stability of simulated populations
Running

53

53

ZeroEval Leaderboard

📊

Embed ZeroEval for evaluation
Running

26

26

Decentralized Arena Leaderboard

🥇

View and compare LLM evaluations across various domains
Running on CPU Upgrade

429

429

Open Medical-LLM Leaderboard

🥇

Explore and submit models for benchmarking
Running

285

285

GPU Poor LLM Arena

🏆

Compact LLM Battle Arena: Frugal AI Face-Off!
Running

126

126

Open VLM Video Leaderboard

🌎

VLMEvalKit Eval Results in video understanding benchmark
Running on CPU Upgrade

13.6k

13.6k

Open LLM Leaderboard

🏆

Track, rank and evaluate open LLMs and chatbots
Running

438

438

TTS Spaces Arena

🤗

Blind vote on HF TTS models!