Leaderboards 🔥 - a sugatoray Collection

sugatoray 's Collections

Papers + RL/Reasoning

Marimo

RLMs (Reasoning Language Models)

Books And Notes

Reasoning Datasets

SmolAgents Tools (Spaces)

Bookmark::Models

LLMs

AV LLMs

LLM Training Datasets

Papers

Leaderboards 🔥

Papers-Fundamentals

TFM: TimeSeries Foundation Models

Papers-Benchmarks

LLMs-EmbeddingModels

LLM + Datasets : Finance

Leaderboards 🔥

updated Mar 12

A collection of Leaderboards for LLMs ⚡️⚖️ 🤗

Running

4.3k

4.3k

Chatbot Arena Leaderboard

🏆

Display chatbot leaderboard and stats
Running on CPU Upgrade

13k

13k

Open LLM Leaderboard

🏆

Track, rank and evaluate open LLMs and chatbots
Building

187

187

Yet Another LLM Leaderboard

🌖

Run a Streamlit web app
Runtime error

136

136

Hallucinations Leaderboard

🔥

View and submit LLM evaluations
Running

464

464

LLM-Perf Leaderboard

🏆

Explore LLM performance across hardware
Running on CPU Upgrade

91

91

LLM Safety Leaderboard

🥇

View and submit machine learning model evaluations
Running

223

223

AI2 WildBench Leaderboard (V2)

🦁

Display and explore model leaderboards and chat history
Runtime error

30

30

Contextual Leaderboard

🐨
Running on CPU Upgrade

5.44k

5.44k

MTEB Leaderboard

🥇

Embedding Leaderboard
Running on CPU Upgrade

54

54

Open CoT Leaderboard

🥇

Track, rank and evaluate open LLMs' CoT quality
Running

314

314

LLM Performance Leaderboard

🐨

View LLM Performance Leaderboard
Running

198

198

BigCodeBench Leaderboard

🥇

Explore and analyze code evaluation data
Running

64

64

The timm Leaderboard

🏆

Display and analyze PyTorch Image Models leaderboard
Running

78

78

Open FinLLM Leaderboard

🥇

Browse and submit large language model evaluations
Running

106

106

Open VLM Video Leaderboard

🌎

VLMEvalKit Eval Results in video understanding benchmark
Build error

43

43

MEGA-Bench Leaderboard

🥇

A leaderboard for multimodal models
Running on CPU Upgrade

92

92

Open LLM Leaderboard Model Comparator

🏆

Compare Open LLM Leaderboard results
Running

132

132

Vidore Leaderboard

🥇

Explore and benchmark visual document retrieval models
Building

100

100

Judge Arena

💻

Compare AI models by voting on responses
Running on CPU Upgrade

715

715

Open VLM Leaderboard

🌎

VLMEvalKit Evaluation Results Collection
Paused

9

9

Keras Chatbot Battle

💬

Interact with multiple chatbots simultaneously
Sleeping

4

4

OmniEval

🥇
Sleeping

5

5

OmniEval

🥇

Official Leaderboard for OmniEval
open-llm-leaderboard/contents

Viewer • Updated Mar 20 • 4.58k • 15.5k • 15
Running on CPU Upgrade

390

390

GAIA Leaderboard

🦾

Submit models for evaluation and view leaderboard results
m-ric/agents_small_benchmark

Viewer • Updated Jan 19, 2024 • 100 • 50 • 10
Running on Zero

349

349

TTS Spaces Arena

🤗

Blind vote on HF TTS models!
Running

105

105

MTEB Arena

⚔

Teach, test, evaluate language models with MTEB Arena
Running on Zero

280

280

GenAI Arena

📈

Realtime Image/Video Gen AI Arena
Running on CPU Upgrade

277

277

Agent Leaderboard

💬

Ranking of LLMs for agentic tasks
Running on CPU Upgrade

724

724

Open ASR Leaderboard

🏆

Request evaluation for new speech models
Running

35

35

Open LMM Reasoning Leaderboard

🥇

A Leaderboard that demonstrates LMM reasoning capabilities
Running

122

122

smolagents LLM leaderboard

🏆

A leaderboard for LLMs powering smolagents
smolagents/benchmark-v1

Viewer • Updated Mar 4 • 132 • 932 • 10