Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
twinkle-ai
's Collections
🏎️ Formosa-1 Series
💾 Traditional Chinese Datasets
🧠 Traditional Chinese Reasoning Datasets
📋 Eval Logs
📋 Eval Logs
updated
29 days ago
Benchmark log generated with Twinkle Eval, recording the model's outputs for each prompt.
Upvote
2
twinkle-ai/llama-4-eval-logs-and-scores
Viewer
•
Updated
Apr 9
•
750
•
72
•
2
Upvote
2
Share collection
View history
Collection guide
Browse collections