RayTsai's picture
Upload GRPO fine-tuned model (40% Chinese dataset, 40hrs training)
6c12da5 verified