kyatham Karthik's picture

kyatham Karthik

Karthik64
ยท

AI & ML interests

None yet

Recent Activity

replied to anakin87's post 7 days ago
๐—œ ๐˜๐—ฟ๐—ฎ๐—ถ๐—ป๐—ฒ๐—ฑ ๐—ฎ ๐—Ÿ๐—ฎ๐—ป๐—ด๐˜‚๐—ฎ๐—ด๐—ฒ ๐— ๐—ผ๐—ฑ๐—ฒ๐—น ๐˜๐—ผ ๐˜€๐—ฐ๐—ต๐—ฒ๐—ฑ๐˜‚๐—น๐—ฒ ๐—ฒ๐˜ƒ๐—ฒ๐—ป๐˜๐˜€ ๐˜„๐—ถ๐˜๐—ต ๐—š๐—ฅ๐—ฃ๐—ข! ๐Ÿ‘‘ ๐Ÿ—“๏ธ โœ๏ธ Blog post: https://huggingface.co/blog/anakin87/qwen-scheduler-grpo I experimented with GRPO lately. I am fascinated by models learning from prompts and rewards - no example answers needed like in Supervised Fine-Tuning. After the DeepSeek boom, everyone is trying GRPO with GSM8K or the Countdown Game... I wanted a different challenge, like ๐˜๐—ฒ๐—ฎ๐—ฐ๐—ต๐—ถ๐—ป๐—ด ๐—ฎ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น ๐˜๐—ผ ๐—ฐ๐—ฟ๐—ฒ๐—ฎ๐˜๐—ฒ ๐—ฎ ๐˜€๐—ฐ๐—ต๐—ฒ๐—ฑ๐˜‚๐—น๐—ฒ ๐—ณ๐—ฟ๐—ผ๐—บ ๐—ฎ ๐—น๐—ถ๐˜€๐˜ ๐—ผ๐—ณ ๐—ฒ๐˜ƒ๐—ฒ๐—ป๐˜๐˜€ ๐—ฎ๐—ป๐—ฑ ๐—ฝ๐—ฟ๐—ถ๐—ผ๐—ฟ๐—ถ๐˜๐—ถ๐—ฒ๐˜€. Choosing an original problem forced me to: ๐Ÿค” Think about the problem setting ๐Ÿงฌ Generate data ๐Ÿค Choose the right base model ๐Ÿ† Design reward functions (and experiencing reward hacking) ๐Ÿ”„ Run multiple rounds of training, hoping that my model would learn something. A fun and rewarding ๐Ÿ˜„ experience. I learned a lot of things, that I want to share with you. ๐Ÿ‘‡ โœ๏ธ Blog post: https://huggingface.co/blog/anakin87/qwen-scheduler-grpo ๐Ÿ’ป Code: https://github.com/anakin87/qwen-scheduler-grpo ๐Ÿค— Hugging Face collection (dataset and model): https://huggingface.co/collections/anakin87/qwen-scheduler-grpo-680bcc583e817390525a8837
View all activity

Organizations

None yet

models 0

None public yet

datasets 0

None public yet