Commit History
Upload from GitHub Actions: Results for 50 languages
3dfd880
verified
Upload from GitHub Actions: Eavaluate on 40 languages
941d5c5
verified
Upload from nightly evaluation run
c3be561
verified
Upload from GitHub Actions: Add math benchmarks
549360a
verified
Upload from GitHub Actions: More results
52abc5b
verified
Upload from nightly evaluation run
4a34e67
verified
Upload from GitHub Actions: Update model ranking fetching
f840423
verified
Upload from GitHub Actions: Use FLORES+ via Huggingface
913253a
verified
Upload from nightly evaluation run
9ee89ef
verified
Upload from nightly evaluation run
8a4050a
verified
Upload from GitHub Actions: New results
b311dd5
verified
Upload from nightly evaluation run
dcb356d
verified
Block gemini-2.5-pro-exp-03-25
092c06a
David Pomerenke
commited on
Only run tasks for which there is no result yet
2f9dee1
David Pomerenke
commited on
Run on 40 languages, additional models
260c1a3
David Pomerenke
commited on
Run evals
b0c61ed
David Pomerenke
commited on
Run on 15 languages
f8a3dad
David Pomerenke
commited on
Add model history plot
f52ec6e
David Pomerenke
commited on
Implement MMLU task
a683732
David Pomerenke
commited on
Add Global MMLU benchmark
ce2acb0
David Pomerenke
commited on
Translation both from and to
731eddd
David Pomerenke
commited on
Add OpenRouter metadata to models
9002fc2
David Pomerenke
commited on
Run on 100 languages, adjust display
8274634
David Pomerenke
commited on
Add Dockerfile
4d13673
David Pomerenke
commited on
Language selection checkboxes & filtering in backend
d91b022
David Pomerenke
commited on
Basic backend setup with FastApi but without actual filtering
2c21cf7
David Pomerenke
commited on
spBLEU tokenizer, run on more languages
eaf2d97
David Pomerenke
commited on
Better map tooltip
92b2164
David Pomerenke
commited on
Process data for country map
723f963
David Pomerenke
commited on
Autonymns and cooler dataset search display
33469f2
David Pomerenke
commited on
More models
c5278dd
David Pomerenke
commited on
Basic language table
d1a7111
David Pomerenke
commited on
Refactor eval code into files
da6e1bc
David Pomerenke
commited on
Model table using React
ecf4195
David Pomerenke
commited on
Better results format (flatten + aggregate 3x), push results to hub
7a9c651
David Pomerenke
commited on
Run on 50 languages
80a0827
David Pomerenke
commited on
Rerun
0638620
David Pomerenke
commited on
Separate overall scores for T2T / S2T
e9a19be
David Pomerenke
commited on
Put all languages into results.json, replace pyglottolog
040dc35
David Pomerenke
commited on
Add ASR ChrF scores
4973af4
David Pomerenke
commited on
More evals
8633921
David Pomerenke
commited on
Better separation of ttt/stt in results format
e223525
David Pomerenke
commited on
Evaluate transcription
3d9cde9
David Pomerenke
commited on
Basic FLEURS transcription setup
1ab3999
David Pomerenke
commited on
Add language families
08735bb
David Pomerenke
commited on
Metrics selector & refactoring
4f572a5
David Pomerenke
commited on
Add masked language modeling (MLM) task
e92634d
David Pomerenke
commited on
For classification use number + few-shot
1b634f3
David Pomerenke
commited on
Show classification and overall score in app
1167b2d
David Pomerenke
commited on