phuongntc/llama32_1b_grpo_manual_multievalsumviet2_penalty Text Generation • Updated about 15 hours ago
phuongntc/llama32_1b_ppo_multievalsumviet2_penalty_improved Text Generation • Updated about 22 hours ago