Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2502.04313

UltraIF: Advancing Instruction Following from the Wild

Paper • 2502.04153 • Published Feb 6 • 24
Great Models Think Alike and this Undermines AI Oversight

Paper • 2502.04313 • Published Feb 6 • 34
Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

Paper • 2502.04328 • Published Feb 6 • 30

MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models

Paper • 2502.00698 • Published Feb 2 • 24
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models

Paper • 2502.01142 • Published Feb 3 • 24
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning

Paper • 2502.01100 • Published Feb 3 • 17
The Jumping Reasoning Curve? Tracking the Evolution of Reasoning Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles

Paper • 2502.01081 • Published Feb 3 • 14

LM Capabilities and Scaling

Compression Represents Intelligence Linearly

Paper • 2404.09937 • Published Apr 15, 2024 • 29
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

Paper • 2404.06395 • Published Apr 9, 2024 • 23
Long-context LLMs Struggle with Long In-context Learning

Paper • 2404.02060 • Published Apr 2, 2024 • 38
Are large language models superhuman chemists?

Paper • 2404.01475 • Published Apr 1, 2024 • 19

Preference Leakage: A Contamination Problem in LLM-as-a-judge

Paper • 2502.01534 • Published Feb 3 • 41
Great Models Think Alike and this Undermines AI Oversight

Paper • 2502.04313 • Published Feb 6 • 34
CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives

Paper • 2504.10823 • Published Apr 15 • 15

2025 LLM Papers on Hugging Face with Japanese Memos

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

Paper • 2501.02955 • Published Jan 6 • 45
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 107
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Paper • 2501.12380 • Published Jan 21 • 86
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos

Paper • 2501.09781 • Published Jan 16 • 29

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

Paper • 2403.09029 • Published Mar 14, 2024 • 56
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression

Paper • 2403.12968 • Published Mar 19, 2024 • 26
RAFT: Adapting Language Model to Domain Specific RAG

Paper • 2403.10131 • Published Mar 15, 2024 • 73
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Paper • 2403.09629 • Published Mar 14, 2024 • 78

UltraIF: Advancing Instruction Following from the Wild

Paper • 2502.04153 • Published Feb 6 • 24
Great Models Think Alike and this Undermines AI Oversight

Paper • 2502.04313 • Published Feb 6 • 34
Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

Paper • 2502.04328 • Published Feb 6 • 30

Preference Leakage: A Contamination Problem in LLM-as-a-judge

Paper • 2502.01534 • Published Feb 3 • 41
Great Models Think Alike and this Undermines AI Oversight

Paper • 2502.04313 • Published Feb 6 • 34
CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives

Paper • 2504.10823 • Published Apr 15 • 15

MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models

Paper • 2502.00698 • Published Feb 2 • 24
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models

Paper • 2502.01142 • Published Feb 3 • 24
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning

Paper • 2502.01100 • Published Feb 3 • 17
The Jumping Reasoning Curve? Tracking the Evolution of Reasoning Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles

Paper • 2502.01081 • Published Feb 3 • 14

2025 LLM Papers on Hugging Face with Japanese Memos

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

Paper • 2501.02955 • Published Jan 6 • 45
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 107
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Paper • 2501.12380 • Published Jan 21 • 86
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos

Paper • 2501.09781 • Published Jan 16 • 29

LM Capabilities and Scaling

Compression Represents Intelligence Linearly

Paper • 2404.09937 • Published Apr 15, 2024 • 29
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

Paper • 2404.06395 • Published Apr 9, 2024 • 23
Long-context LLMs Struggle with Long In-context Learning

Paper • 2404.02060 • Published Apr 2, 2024 • 38
Are large language models superhuman chemists?

Paper • 2404.01475 • Published Apr 1, 2024 • 19

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

Paper • 2403.09029 • Published Mar 14, 2024 • 56
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression

Paper • 2403.12968 • Published Mar 19, 2024 • 26
RAFT: Adapting Language Model to Domain Specific RAG

Paper • 2403.10131 • Published Mar 15, 2024 • 73
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Paper • 2403.09629 • Published Mar 14, 2024 • 78

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs