VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos Paper • 2506.05349 • Published 4 days ago • 24
Qwen2.5-Coder Collection Code-specific model series based on Qwen2.5 • 40 items • Updated Apr 28 • 319
Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark Paper • 2503.20786 • Published Mar 26 • 2
CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark Paper • 2505.16968 • Published 18 days ago • 39
Time Blindness: Why Video-Language Models Can't See What Humans Can? Paper • 2505.24867 • Published 10 days ago • 75
SVRPBench: A Realistic Benchmark for Stochastic Vehicle Routing Problem Paper • 2505.21887 • Published 13 days ago • 14
CASS Collection Large-scale dataset and model suite for cross-architecture GPU code transpilation between CUDA and HIP at both source and assembly levels • 2 items • Updated 26 days ago • 5
SALT: Singular Value Adaptation with Low-Rank Transformation Paper • 2503.16055 • Published Mar 20 • 8
KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding Paper • 2502.14949 • Published Feb 20 • 8
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published Feb 13 • 149
BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities Paper • 2412.07769 • Published Dec 10, 2024 • 29
Optimizing Brain Tumor Segmentation with MedNeXt: BraTS 2024 SSA and Pediatrics Paper • 2411.15872 • Published Nov 24, 2024 • 5
From CISC to RISC: language-model guided assembly transpilation Paper • 2411.16341 • Published Nov 25, 2024 • 15