How Much Backtracking is Enough? Exploring the Interplay of SFT and RL in Enhancing LLM Reasoning
Paper
โข
2505.24273
โข
Published
โข
4
โข
3
datatrove
for all things web-scale data preparation: https://github.com/huggingface/datatrovenanotron
for lightweight 4D parallelism LLM training: https://github.com/huggingface/nanotronlighteval
for in-training fast parallel LLM evaluations: https://github.com/huggingface/lighteval