EVOREFUSE: Evolutionary Prompt Optimization for Evaluation and Mitigation of LLM Over-Refusal to Pseudo-Malicious Instructions Paper • 2505.23473 • Published 12 days ago • 1
Evaluating LLMs Robustness in Less Resourced Languages with Proxy Models Paper • 2506.07645 • Published 1 day ago • 1
MegaHan97K: A Large-Scale Dataset for Mega-Category Chinese Character Recognition with over 97K Categories Paper • 2506.04807 • Published 6 days ago • 2
Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions Paper • 2506.07527 • Published 1 day ago • 3
Dreamland: Controllable World Creation with Simulator and Generative Models Paper • 2506.08006 • Published 1 day ago • 6
GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior Paper • 2506.08012 • Published 1 day ago • 7
Debatable Intelligence: Benchmarking LLM Judges via Debate Speech Evaluation Paper • 2506.05062 • Published 5 days ago • 11
CCI4.0: A Bilingual Pretraining Dataset for Enhancing Reasoning in Large Language Models Paper • 2506.07463 • Published 2 days ago • 8
Bootstrapping World Models from Dynamics Models in Multimodal Foundation Models Paper • 2506.06006 • Published 5 days ago • 10
BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation Paper • 2506.07530 • Published 1 day ago • 12
GTR-CoT: Graph Traversal as Visual Chain of Thought for Molecular Structure Recognition Paper • 2506.07553 • Published 1 day ago • 12
Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning Paper • 2506.06205 • Published 4 days ago • 20
SpatialLM: Training Large Language Models for Structured Indoor Modeling Paper • 2506.07491 • Published 1 day ago • 29
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation Paper • 2506.07977 • Published 1 day ago • 37
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance Paper • 2506.06444 • Published 4 days ago • 60
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning Paper • 2506.07044 • Published 3 days ago • 80
When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration Paper • 2506.05579 • Published 5 days ago • 3