Debatable Intelligence: Benchmarking LLM Judges via Debate Speech Evaluation Paper β’ 2506.05062 β’ Published 6 days ago β’ 11
StressTest: Can YOUR Speech LM Handle the Stress? Paper β’ 2505.22765 β’ Published 13 days ago β’ 17
CHIMERA: A Knowledge Base of Idea Recombination in Scientific Literature Paper β’ 2505.20779 β’ Published 15 days ago β’ 15
Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning Paper β’ 2505.17813 β’ Published 19 days ago β’ 55
WHISTRESS: Enriching Transcriptions with Sentence Stress Detection Paper β’ 2505.19103 β’ Published 17 days ago β’ 13
RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation Paper β’ 2504.17502 β’ Published Apr 24 β’ 56