PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning Paper • 2502.12054 • Published Feb 17 • 6
Kanana: Compute-efficient Bilingual Language Models Paper • 2502.18934 • Published 27 days ago • 64
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning Paper • 2502.17407 • Published 29 days ago • 25
HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models Paper • 2309.02706 • Published Sep 6, 2023 • 2