TUMLU: A Unified and Native Language Understanding Benchmark for Turkic Languages Paper • 2502.11020 • Published Feb 16 • 6
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents Paper • 2505.20411 • Published May 26 • 86
Falcon Edge series Collection A series of powerful, universal and fine-tunable small Language Models • 7 items • Updated May 21 • 22
view article Article CircleGuardBench: New Standard for Evaluating AI Moderation Models By whitecircle-ai and 7 others • May 7 • 53
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters Paper • 2504.08791 • Published Apr 7 • 133