Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2501.16411

MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding

Paper • 2501.18362 • Published 22 days ago • 21
PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding

Paper • 2501.16411 • Published 25 days ago • 18

PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding

Paper • 2501.16411 • Published 25 days ago • 18
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published 25 days ago • 26
Return of the Encoder: Maximizing Parameter Efficiency for SLMs

Paper • 2501.16273 • Published 25 days ago • 5

USC-GVL/PhysBench

Updated 19 days ago • 649 • 10
WeiChow/PhysBench-train

Updated Jan 11 • 268
WeiChow/PhysBench-assets

Updated Jan 7 • 69
WeiChow/PhysBench-media

Updated Jan 7 • 57

Multimodal Benchmarks

about 21 hours ago

Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

Paper • 2407.07053 • Published Jul 9, 2024 • 44
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

Paper • 2407.12772 • Published Jul 17, 2024 • 34
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

Paper • 2407.11691 • Published Jul 16, 2024 • 14
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Paper • 2408.02718 • Published Aug 5, 2024 • 61

about 5 hours ago

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 192
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Paper • 2311.16502 • Published Nov 27, 2023 • 35
BLINK: Multimodal Large Language Models Can See but Not Perceive

Paper • 2404.12390 • Published Apr 18, 2024 • 26
RULER: What's the Real Context Size of Your Long-Context Language Models?

Paper • 2404.06654 • Published Apr 9, 2024 • 35

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs