RPT-DeepSeek-R1-0528-Qwen3-8B / model-00002-of-00004.safetensors

Commit History

GRPO fine-tuned DeepSeek-R1-Qwen3-8B for next token prediction according to paper https://huggingface.co/papers/2506.08007
251e747
verified

ykarout commited on