evals-for-every-language / results.json

Commit History

Upload from GitHub Actions: More models and languages
a73f888
verified

davidpomerenke commited on

Upload from GitHub Actions: Results for 50 languages
3dfd880
verified

davidpomerenke commited on

Upload from GitHub Actions: Eavaluate on 40 languages
941d5c5
verified

davidpomerenke commited on

Upload from nightly evaluation run
c3be561
verified

davidpomerenke commited on

Upload from GitHub Actions: Add math benchmarks
549360a
verified

davidpomerenke commited on

Upload from GitHub Actions: More results
52abc5b
verified

davidpomerenke commited on

Upload from nightly evaluation run
4a34e67
verified

davidpomerenke commited on

Upload from GitHub Actions: Update model ranking fetching
f840423
verified

davidpomerenke commited on

Upload from GitHub Actions: Use FLORES+ via Huggingface
913253a
verified

davidpomerenke commited on

Upload from nightly evaluation run
9ee89ef
verified

davidpomerenke commited on

Upload from nightly evaluation run
8a4050a
verified

davidpomerenke commited on

Upload from GitHub Actions: New results
b311dd5
verified

davidpomerenke commited on

Upload from nightly evaluation run
dcb356d
verified

davidpomerenke commited on

Block gemini-2.5-pro-exp-03-25
092c06a

David Pomerenke commited on

Only run tasks for which there is no result yet
2f9dee1

David Pomerenke commited on

Run on 40 languages, additional models
260c1a3

David Pomerenke commited on

Run evals
b0c61ed

David Pomerenke commited on

Run on 15 languages
f8a3dad

David Pomerenke commited on

Add model history plot
f52ec6e

David Pomerenke commited on

Implement MMLU task
a683732

David Pomerenke commited on

Add Global MMLU benchmark
ce2acb0

David Pomerenke commited on

Translation both from and to
731eddd

David Pomerenke commited on

Add OpenRouter metadata to models
9002fc2

David Pomerenke commited on

Run on 100 languages, adjust display
8274634

David Pomerenke commited on

Add Dockerfile
4d13673

David Pomerenke commited on

Language selection checkboxes & filtering in backend
d91b022

David Pomerenke commited on

Basic backend setup with FastApi but without actual filtering
2c21cf7

David Pomerenke commited on

spBLEU tokenizer, run on more languages
eaf2d97

David Pomerenke commited on

Better map tooltip
92b2164

David Pomerenke commited on

Process data for country map
723f963

David Pomerenke commited on

Autonymns and cooler dataset search display
33469f2

David Pomerenke commited on

More models
c5278dd

David Pomerenke commited on

Basic language table
d1a7111

David Pomerenke commited on

Refactor eval code into files
da6e1bc

David Pomerenke commited on

Model table using React
ecf4195

David Pomerenke commited on

Better results format (flatten + aggregate 3x), push results to hub
7a9c651

David Pomerenke commited on

Run on 50 languages
80a0827

David Pomerenke commited on

Rerun
0638620

David Pomerenke commited on

Separate overall scores for T2T / S2T
e9a19be

David Pomerenke commited on

Put all languages into results.json, replace pyglottolog
040dc35

David Pomerenke commited on

Add ASR ChrF scores
4973af4

David Pomerenke commited on

More evals
8633921

David Pomerenke commited on

Better separation of ttt/stt in results format
e223525

David Pomerenke commited on

Evaluate transcription
3d9cde9

David Pomerenke commited on

Basic FLEURS transcription setup
1ab3999

David Pomerenke commited on

Add language families
08735bb

David Pomerenke commited on

Metrics selector & refactoring
4f572a5

David Pomerenke commited on

Add masked language modeling (MLM) task
e92634d

David Pomerenke commited on

For classification use number + few-shot
1b634f3

David Pomerenke commited on

Show classification and overall score in app
1167b2d

David Pomerenke commited on