KevinG/Llama-3.1-8B-Instruct-GRPO-alpaca_mix_combine_naive-llm-judge-42 Text Generation • 8B • Updated 6 days ago • 22
KevinG/Llama-3.1-8B-Instruct-GRPO-alpaca_mix_combine_naive_least_similar-llm-judge-42 Text Generation • 8B • Updated 6 days ago • 21