Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows Paper • 2411.07763 • Published Nov 12, 2024 • 1
When Attention Sink Emerges in Language Models: An Empirical View Paper • 2410.10781 • Published Oct 14, 2024
Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs Paper • 2502.12982 • Published Feb 18 • 16
Predictive Data Selection: The Data That Predicts Is the Data That Teaches Paper • 2503.00808 • Published Mar 2 • 57
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild Paper • 2503.18892 • Published 23 days ago • 29
SkyLadder: Better and Faster Pretraining via Context Window Scheduling Paper • 2503.15450 • Published 28 days ago • 11
CodeArena: A Collective Evaluation Platform for LLM Code Generation Paper • 2503.01295 • Published Mar 3 • 8
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published Feb 20 • 103
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Paper • 2501.01257 • Published Jan 2 • 53
Iterative Forward Tuning Boosts In-Context Learning in Language Models Paper • 2305.13016 • Published May 22, 2023 • 1
PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts Paper • 2305.14839 • Published May 24, 2023 • 1
One Shot Learning as Instruction Data Prospector for Large Language Models Paper • 2312.10302 • Published Dec 16, 2023 • 3
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions Paper • 2406.15877 • Published Jun 22, 2024 • 47
Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning Paper • 2301.13808 • Published Jan 31, 2023
Graphix-T5: Mixing Pre-Trained Transformers with Graph-Aware Layers for Text-to-SQL Parsing Paper • 2301.07507 • Published Jan 18, 2023
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement Paper • 2409.12122 • Published Sep 18, 2024 • 3
ExecRepoBench: Multi-level Executable Code Completion Evaluation Paper • 2412.11990 • Published Dec 16, 2024