ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations Paper • 2504.00824 • Published Apr 1 • 43
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate Paper • 2501.17703 • Published Jan 29 • 59
Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos Paper • 2501.13826 • Published Jan 23 • 26
JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation Paper • 2410.17250 • Published Oct 22, 2024 • 15
Long Context Alignment with Short Instructions and Synthesized Positions Paper • 2405.03939 • Published May 7, 2024
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale Paper • 2412.05237 • Published Dec 6, 2024 • 48
Evaluating Language Models as Synthetic Data Generators Paper • 2412.03679 • Published Dec 4, 2024 • 49
Teach Multimodal LLMs to Comprehend Electrocardiographic Images Paper • 2410.19008 • Published Oct 21, 2024 • 24
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages Paper • 2410.16153 • Published Oct 21, 2024 • 45
MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures Paper • 2410.13754 • Published Oct 17, 2024 • 76
KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks Paper • 2410.06526 • Published Oct 9, 2024 • 1
Harnessing Webpage UIs for Text-Rich Visual Understanding Paper • 2410.13824 • Published Oct 17, 2024 • 32
Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA Paper • 2405.20421 • Published May 30, 2024
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks Paper • 2410.10563 • Published Oct 14, 2024 • 39
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark Paper • 2409.02813 • Published Sep 4, 2024 • 32
MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures Paper • 2406.06565 • Published Jun 3, 2024 • 10
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark Paper • 2406.01574 • Published Jun 3, 2024 • 48