OctoThinker-Llama-8B Family - a sii-research Collection

sii-research 's Collections

updated Jul 6

What makes a base language model suitable for RL? Through controlled experiments, we identify key factors then leverage them to scale up mid-training.

Upvote

sii-research/OctoThinker-8B-Long-Base

Text Generation • 8B • Updated Jul 6 • 9 • 1
sii-research/OctoThinker-8B-Hybrid-Base

Text Generation • 8B • Updated Jul 6 • 10
sii-research/OctoThinker-8B-Short-Base

Text Generation • 8B • Updated Jul 6 • 13
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling

Paper • 2506.20512 • Published Jun 25 • 46

Upvote

Collection guide
Browse collections