AlgoDistill's picture

AlgoDistill

AlgoDistill

·

AI & ML interests

jailbreaking

Organizations

upvoted 6 papers 7 months ago

R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts

Paper • 2502.20395 • Published Feb 27 • 46

Self-rewarding correction for mathematical reasoning

Paper • 2502.19613 • Published Feb 26 • 83

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 204

How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?

Paper • 2502.14502 • Published Feb 20 • 91

S*: Test Time Scaling for Code Generation

Paper • 2502.14382 • Published Feb 20 • 63

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published Feb 20 • 105

upvoted 4 papers 8 months ago

s1: Simple test-time scaling

Paper • 2501.19393 • Published Jan 31 • 125

Reward-Guided Speculative Decoding for Efficient LLM Reasoning

Paper • 2501.19324 • Published Jan 31 • 39

DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation

Paper • 2501.16764 • Published Jan 28 • 22

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 122