Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
AlgoDistill's picture
2 10 24

AlgoDistill

AlgoDistill
Joy466's profile picture 21world's profile picture
·
  • AlgoDistill
  • AlgoDistill

AI & ML interests

jailbreaking

Organizations

Open Reinforcement Fine-tuning's profile picture

upvoted 6 papers 7 months ago

R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts

Paper • 2502.20395 • Published Feb 27 • 46

Self-rewarding correction for mathematical reasoning

Paper • 2502.19613 • Published Feb 26 • 83

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 204

How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?

Paper • 2502.14502 • Published Feb 20 • 91

S*: Test Time Scaling for Code Generation

Paper • 2502.14382 • Published Feb 20 • 63

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published Feb 20 • 105
upvoted 4 papers 8 months ago

s1: Simple test-time scaling

Paper • 2501.19393 • Published Jan 31 • 125

Reward-Guided Speculative Decoding for Efficient LLM Reasoning

Paper • 2501.19324 • Published Jan 31 • 39

DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation

Paper • 2501.16764 • Published Jan 28 • 22

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 122
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs