๐ Eval Logs Collection Benchmark log generated with Twinkle Eval, recording the model's outputs for each prompt. โข 1 item โข Updated 28 days ago โข 2
๐๏ธ Formosa-1 Series Collection A collection of Formosa-1 (F1) reasoning models and datasets focused on Traditional Chinese instruction-following and logic. โข 4 items โข Updated 28 days ago โข 3
๐ง Traditional Chinese Reasoning Datasets Collection A curated collection of datasets designed to evaluate and train reasoning capabilities in Traditional Chinese across various domains. โข 3 items โข Updated 28 days ago โข 8