Do you have the numbers that shows method 2 works and better? @prithivMLmods
Xiaoyun Wu
CUIGuy
·
AI & ML interests
conversational user interface
Recent Activity
commented on
an
article
about 1 month ago
Fine-tuning SmolLM with Group Relative Policy Optimization (GRPO) by following the Methodologies
commented on
an
article
about 1 month ago
Fine-tuning SmolLM with Group Relative Policy Optimization (GRPO) by following the Methodologies
commented on
an
article
about 1 month ago
Fine-tuning SmolLM with Group Relative Policy Optimization (GRPO) by following the Methodologies
Organizations
CUIGuy's activity
commented on
Fine-tuning SmolLM with Group Relative Policy Optimization (GRPO) by following the Methodologies
about 1 month ago
commented on
Fine-tuning SmolLM with Group Relative Policy Optimization (GRPO) by following the Methodologies
about 1 month ago
method 2 has vllm in it, but disabled, method 1 does on have vllm in it. am I missing something? So method 2 is the one to use? if enable grpo?
commented on
Fine-tuning SmolLM with Group Relative Policy Optimization (GRPO) by following the Methodologies
about 1 month ago
which one is vllm based? How can one tell? can you mention it in article?
Aslo, are you happen to aware work on using grpo to improve MMLU (or some task inside it) with models like qwen 2.5 7b or even smaller?
@prithivMLmods
thanks.
commented on
Fine-tuning SmolLM with Group Relative Policy Optimization (GRPO) by following the Methodologies
about 1 month ago
why there are 2 methods?
when will have a ggml version?
8
#3 opened over 1 year ago
by
CUIGuy
can not install rotary
1
#4 opened over 1 year ago
by
CUIGuy
why we can not make this fully HF ready?
8
#11 opened over 1 year ago
by
CUIGuy
why we can not make this fully HF ready?
8
#11 opened over 1 year ago
by
CUIGuy
why we can not make this fully HF ready?
8
#11 opened over 1 year ago
by
CUIGuy