-
UltraIF: Advancing Instruction Following from the Wild
Paper • 2502.04153 • Published • 24 -
Great Models Think Alike and this Undermines AI Oversight
Paper • 2502.04313 • Published • 34 -
Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment
Paper • 2502.04328 • Published • 30
Collections
Discover the best community collections!
Collections including paper arxiv:2502.04313
-
MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models
Paper • 2502.00698 • Published • 24 -
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models
Paper • 2502.01142 • Published • 24 -
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning
Paper • 2502.01100 • Published • 17 -
The Jumping Reasoning Curve? Tracking the Evolution of Reasoning Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles
Paper • 2502.01081 • Published • 14
-
Compression Represents Intelligence Linearly
Paper • 2404.09937 • Published • 29 -
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Paper • 2404.06395 • Published • 23 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 38 -
Are large language models superhuman chemists?
Paper • 2404.01475 • Published • 19
-
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Paper • 2502.01534 • Published • 41 -
Great Models Think Alike and this Undermines AI Oversight
Paper • 2502.04313 • Published • 34 -
CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives
Paper • 2504.10823 • Published • 15
-
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
Paper • 2501.02955 • Published • 45 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 -
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Paper • 2501.12380 • Published • 86 -
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos
Paper • 2501.09781 • Published • 29
-
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset
Paper • 2403.09029 • Published • 56 -
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
Paper • 2403.12968 • Published • 26 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 73 -
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Paper • 2403.09629 • Published • 78
-
UltraIF: Advancing Instruction Following from the Wild
Paper • 2502.04153 • Published • 24 -
Great Models Think Alike and this Undermines AI Oversight
Paper • 2502.04313 • Published • 34 -
Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment
Paper • 2502.04328 • Published • 30
-
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Paper • 2502.01534 • Published • 41 -
Great Models Think Alike and this Undermines AI Oversight
Paper • 2502.04313 • Published • 34 -
CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives
Paper • 2504.10823 • Published • 15
-
MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models
Paper • 2502.00698 • Published • 24 -
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models
Paper • 2502.01142 • Published • 24 -
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning
Paper • 2502.01100 • Published • 17 -
The Jumping Reasoning Curve? Tracking the Evolution of Reasoning Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles
Paper • 2502.01081 • Published • 14
-
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
Paper • 2501.02955 • Published • 45 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 -
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Paper • 2501.12380 • Published • 86 -
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos
Paper • 2501.09781 • Published • 29
-
Compression Represents Intelligence Linearly
Paper • 2404.09937 • Published • 29 -
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Paper • 2404.06395 • Published • 23 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 38 -
Are large language models superhuman chemists?
Paper • 2404.01475 • Published • 19
-
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset
Paper • 2403.09029 • Published • 56 -
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
Paper • 2403.12968 • Published • 26 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 73 -
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Paper • 2403.09629 • Published • 78