Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Paper
•
2503.07572
•
Published
•
40
Ahora la comunidad se llama Somos NLP ➡️ https://huggingface.co/somosnlp
Llama-3-8B-instruct
) to generate synthetic instructions and then fine-tune the base version (Llama-3-8B
) on this dataset, you can improve even the it-tuned versionollama
models (initially phi and llama3) automatically and upload it to the Hugging Face Hub!