Inference-Time Scaling for Generalist Reward Modeling Paper • 2504.02495 • Published Apr 3 • 55
DNA-R1 Collection Reasoning model distilled from DeepSeek-R1, enhanced with GRPO using supplementary reasoning datasets. • 1 item • Updated 12 days ago • 2
🧠Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 24 items • Updated 23 days ago • 150
Mamba: Linear-Time Sequence Modeling with Selective State Spaces Paper • 2312.00752 • Published Dec 1, 2023 • 143