Debatable Intelligence: Benchmarking LLM Judges via Debate Speech Evaluation Paper • 2506.05062 • Published 6 days ago • 11
WHISTRESS: Enriching Transcriptions with Sentence Stress Detection Paper • 2505.19103 • Published 17 days ago • 13
Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning Paper • 2505.17813 • Published 19 days ago • 55
CHIMERA: A Knowledge Base of Idea Recombination in Scientific Literature Paper • 2505.20779 • Published 15 days ago • 15
Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models Paper • 2504.01137 • Published Apr 1 • 21
Scaling Analysis of Interleaved Speech-Text Language Models Paper • 2504.02398 • Published Apr 3 • 29
OmnimatteZero: Training-free Real-time Omnimatte with Pre-trained Video Diffusion Models Paper • 2503.18033 • Published Mar 23 • 25
RewardSDS: Aligning Score Distillation via Reward-Weighted Sampling Paper • 2503.09601 • Published Mar 12 • 15
Can this Model Also Recognize Dogs? Zero-Shot Model Search from Weights Paper • 2502.09619 • Published Feb 13 • 36
JuStRank: Benchmarking LLM Judges for System Ranking Paper • 2412.09569 • Published Dec 12, 2024 • 20
Hidden in the Noise: Two-Stage Robust Watermarking for Images Paper • 2412.04653 • Published Dec 5, 2024 • 31
SELECT: A Large-Scale Benchmark of Data Curation Strategies for Image Classification Paper • 2410.05057 • Published Oct 7, 2024 • 7
Style over Substance: Failure Modes of LLM Judges in Alignment Benchmarking Paper • 2409.15268 • Published Sep 23, 2024 • 13
Is It Really Long Context if All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP Paper • 2407.00402 • Published Jun 29, 2024 • 23