phuongntc/llama32_1b_grpo_manual_multievalsumviet2_penalty Text Generation • Updated about 13 hours ago
phuongntc/llama32_1b_ppo_multievalsumviet2_penalty_improved Text Generation • Updated about 20 hours ago