M Saad Salman
MSS444
ยท
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
about 20 hours ago
Rethinking Thinking Tokens: LLMs as Improvement Operators
upvoted
a
paper
about 20 hours ago
One-Token Rollout: Guiding Supervised Fine-Tuning of LLMs with Policy
Gradient
upvoted
a
paper
about 20 hours ago
Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm:
Demystifying Some Myths About GRPO and Its Friends
Organizations
None yet