dumbequation/Qwen2.5-3B-reasoning-medical-symptoms-GRPO-quant
3B
•
Updated
•
4
Models I've trained to think like DeepSeek R1 using online learning - Group Relative Policy Optimization (GRPO) introduced by DeepSeekMath