Kimi k1.5: Scaling Reinforcement Learning with LLMs Paper • 2501.12599 • Published 9 days ago • 76
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision Paper • 2403.09472 • Published Mar 14, 2024 • 1
Forward-Backward Reasoning in Large Language Models for Mathematical Verification Paper • 2308.07758 • Published Aug 15, 2023 • 4
DeepVecFont-v2: Exploiting Transformers to Synthesize Vector Fonts with Higher Quality Paper • 2303.14585 • Published Mar 25, 2023
Kimi k1.5: Scaling Reinforcement Learning with LLMs Paper • 2501.12599 • Published 9 days ago • 76
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson? Paper • 2411.16489 • Published Nov 25, 2024 • 42
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published Dec 9, 2024 • 79
SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator Paper • 2412.12094 • Published Dec 16, 2024 • 10
Qwen2-Math Collection Math-specific model series based on Qwen2 • 8 items • Updated Nov 28, 2024 • 48
NuminaMath Collection Datasets and models for training SOTA math LLMs. See our GitHub for training & inference code: https://github.com/project-numina/aimo-progress-prize • 6 items • Updated Jul 21, 2024 • 70
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies Paper • 2407.13623 • Published Jul 18, 2024 • 54