arxiv:2506.03295

Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem

Published on Jun 3

· Submitted by

ubowang on Jun 5

Upvote

Authors:

Yubo Wang ,

Wenhu Chen

Abstract

Critique Fine-Tuning on a single problem can efficiently enhance the reasoning capabilities of large language models with significant performance gains and reduced computational cost compared to reinforcement learning.

AI-generated summary

We have witnessed that strong LLMs like Qwen-Math, MiMo, and Phi-4 possess immense reasoning potential inherited from the pre-training stage. With reinforcement learning (RL), these models can improve dramatically on reasoning tasks. Recent studies have shown that even RL on a single problem can unleash these models' reasoning capabilities. However, RL is not only expensive but also unstable. Even one-shot RL requires hundreds of GPU hours. This raises a critical question: Is there a more efficient way to unleash the reasoning potential of these powerful base LLMs? In this work, we demonstrate that Critique Fine-Tuning (CFT) on only one problem can effectively unleash the reasoning potential of LLMs. Our method constructs critique data by collecting diverse model-generated solutions to a single problem and using teacher LLMs to provide detailed critiques. We fine-tune Qwen and Llama family models, ranging from 1.5B to 14B parameters, on the CFT data and observe significant performance gains across diverse reasoning tasks. For example, with just 5 GPU hours of training, Qwen-Math-7B-CFT show an average improvement of 15% on six math benchmarks and 16% on three logic reasoning benchmarks. These results are comparable to or even surpass the results from RL with 20x less compute. Ablation studies reveal the robustness of one-shot CFT across different prompt problems. These results highlight one-shot CFT as a simple, general, and compute-efficient approach to unleashing the reasoning capabilities of modern LLMs.

View arXiv page View PDF Add to collection

Community

ubowang

Paper author Paper submitter 1 day ago

We found that Supervised Fine-tuning on ONE problem can achieve similar performance gain as RL on ONE problem with 20x less compute! In the paper, we show that Critique Fine-Tuning on 1 problem can boost the average accuracy of six mathematical benchmarks (MATH-500, AMC, OlympiadBench, etc) by 5-15% across different-sized models. We further test on logic reasoning tasks from BBEH, like causal reasoning, disambiguation, etc and show a similar performance gain of 15%. Therefore, we believe CFT works as a more efficient approach to unleash the hidden reasoning capabilities of the pre-trained LLMs!

Project Website: https://tiger-ai-lab.github.io/One-Shot-CFT/
Github: https://github.com/TIGER-AI-Lab/One-Shot-CFT
HF Models: https://huggingface.co/collections/TIGER-Lab/one-shot-cft-683fbb4d2bcf698dbea8fb21
Dataset: https://huggingface.co/datasets/TIGER-Lab/One-Shot-CFT-Data