@telcom on Hugging Face: "MAD-GRPO: https://huggingface.co/blog/telcom/mad-grpo In R1-Zero-Like Training…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

posted an update Jan 17

Post

1585

MAD-GRPO: https://huggingface.co/blog/telcom/mad-grpo
In R1-Zero-Like Training *, Dr.GRPO treats GRPO’s by dropping std, but that often comes with a hidden side effect: length-weighted updates that can nudge model toward verbosity.
MAD-GRPO provides robust scale (MAD + epsilon) per-token normalization stability without verbosity bias.

*https://huggingface.co/papers/2503.20783

In this post

telcom Javad Taghia