LoraRetriever: Input-Aware LoRA Retrieval and Composition for Mixed Tasks in the Wild Paper • 2402.09997 • Published Feb 15, 2024 • 1
CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models Paper • 2402.13109 • Published Feb 20, 2024
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model Paper • 2410.13639 • Published Oct 17, 2024 • 19
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions Paper • 2410.20424 • Published Oct 27, 2024 • 41
SmartTrim: Adaptive Tokens and Attention Pruning for Efficient Vision-Language Models Paper • 2305.15033 • Published May 24, 2023
AI PERSONA: Towards Life-long Personalization of LLMs Paper • 2412.13103 • Published Dec 17, 2024 • 2
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published Feb 20 • 105
ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding Paper • 2505.23922 • Published May 29
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation Paper • 2506.02397 • Published Jun 3 • 36
BERT Loses Patience: Fast and Robust Inference with Early Exit Paper • 2006.04152 • Published Jun 7, 2020
PersonaFeedback: A Large-scale Human-annotated Benchmark For Personalization Paper • 2506.12915 • Published 29 days ago • 21
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values Paper • 2504.05535 • Published Apr 7 • 44
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models Paper • 2502.16614 • Published Feb 23 • 27
ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning Paper • 2501.06590 • Published Jan 11 • 11
PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment Paper • 2410.13785 • Published Oct 17, 2024 • 19
PositionID: LLMs can Control Lengths, Copy and Paste with Explicit Positional Awareness Paper • 2410.07035 • Published Oct 9, 2024 • 17