SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories? Paper • 2507.12415 • Published 2 days ago • 28 • 1
ZeCO: Zero Communication Overhead Sequence Parallelism for Linear Attention Paper • 2507.01004 • Published 17 days ago • 10 • 1
SkyLadder: Better and Faster Pretraining via Context Window Scheduling Paper • 2503.15450 • Published Mar 19 • 12 • 2
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates Paper • 2410.07137 • Published Oct 9, 2024 • 8 • 2
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies Paper • 2407.13623 • Published Jul 18, 2024 • 57 • 6
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies Paper • 2407.13623 • Published Jul 18, 2024 • 57 • 6
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies Paper • 2407.13623 • Published Jul 18, 2024 • 57 • 6
RegMix: Data Mixture as Regression for Language Model Pre-training Paper • 2407.01492 • Published Jul 1, 2024 • 40 • 7
RegMix: Data Mixture as Regression for Language Model Pre-training Paper • 2407.01492 • Published Jul 1, 2024 • 40 • 7