MMMU

non-profit

https://mmmu-benchmark.github.io/

Activity Feed Request to join this org

AI & ML interests

Multimodal Model Evaluation

Recent Activity

zhangysk authored a paper 11 days ago

Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning

zhangysk authored a paper 11 days ago

VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation

zhangysk authored a paper 11 days ago

ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding

View all activity

zhangysk

authored 6 papers 11 days ago

Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning

Paper • 2504.13914 • Published Apr 10 • 1

VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation

Paper • 2505.14640 • Published May 20 • 14

ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding

Paper • 2505.23922 • Published about 1 month ago

P2P: Automated Paper-to-Poster Generation and Fine-Grained Benchmark

Paper • 2505.17104 • Published May 21

TaskCraft: Automated Generation of Agentic Tasks

Paper • 2506.10055 • Published 17 days ago • 31

Scaling Test-time Compute for LLM Agents

Paper • 2506.12928 • Published 13 days ago • 58

wren93

authored a paper 23 days ago

Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning

Paper • 2505.15966 • Published May 21 • 51

wenhu

authored 2 papers 24 days ago

VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation

Paper • 2506.03930 • Published 24 days ago • 24

Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem

Paper • 2506.03295 • Published 25 days ago • 17

yuanshengni

authored a paper 24 days ago

VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation

Paper • 2506.03930 • Published 24 days ago • 24

yuanshengni

authored a paper about 1 month ago

PhyX: Does Your Model Have the "Wits" for Physical Reasoning?

Paper • 2505.15929 • Published May 21 • 48

DongfuJiang

authored a paper about 1 month ago

StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs

Paper • 2505.20139 • Published May 26 • 18

wenhu

authored a paper about 1 month ago

StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs

Paper • 2505.20139 • Published May 26 • 18

DongfuJiang

authored a paper about 1 month ago

QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design

Paper • 2505.16175 • Published May 22 • 40

zhangysk

authored a paper about 1 month ago

General-Reasoner: Advancing LLM Reasoning Across All Domains

Paper • 2505.14652 • Published May 20 • 22

DongfuJiang

authored a paper about 1 month ago

General-Reasoner: Advancing LLM Reasoning Across All Domains

Paper • 2505.14652 • Published May 20 • 22

wren93

authored a paper about 1 month ago

VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation

Paper • 2505.14640 • Published May 20 • 14

gneubig

authored a paper about 1 month ago

The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think

Paper • 2505.10185 • Published May 15 • 25

zhangysk

authored 2 papers about 2 months ago

A Comprehensive Survey on Long Context Language Modeling

Paper • 2503.17407 • Published Mar 20 • 49

AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection

Paper • 2505.07293 • Published May 12 • 26