-
Stabilizing RLHF through Advantage Model and Selective Rehearsal
Paper • 2309.10202 • Published • 11 -
Multimodal Foundation Models: From Specialists to General-Purpose Assistants
Paper • 2309.10020 • Published • 41 -
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Paper • 2402.14658 • Published • 82
M1n9X
m1n9x
·
AI & ML interests
None yet
Recent Activity
updated
a model
2 days ago
m1n9x/Qwen2.5_3B-GRPO-medical-reasoning
published
a model
6 days ago
m1n9x/Qwen2.5_3B-GRPO-medical-reasoning
liked
a Space
18 days ago
nanotron/ultrascale-playbook
Organizations
Collections
1
datasets
None public yet