Papers
arxiv:2506.03295

Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem

Published on Jun 3
· Submitted by ubowang on Jun 5
Authors:
,
,
,

Abstract

Critique Fine-Tuning on a single problem can efficiently enhance the reasoning capabilities of large language models with significant performance gains and reduced computational cost compared to reinforcement learning.

AI-generated summary

We have witnessed that strong LLMs like Qwen-Math, MiMo, and Phi-4 possess immense reasoning potential inherited from the pre-training stage. With reinforcement learning (RL), these models can improve dramatically on reasoning tasks. Recent studies have shown that even RL on a single problem can unleash these models' reasoning capabilities. However, RL is not only expensive but also unstable. Even one-shot RL requires hundreds of GPU hours. This raises a critical question: Is there a more efficient way to unleash the reasoning potential of these powerful base LLMs? In this work, we demonstrate that Critique Fine-Tuning (CFT) on only one problem can effectively unleash the reasoning potential of LLMs. Our method constructs critique data by collecting diverse model-generated solutions to a single problem and using teacher LLMs to provide detailed critiques. We fine-tune Qwen and Llama family models, ranging from 1.5B to 14B parameters, on the CFT data and observe significant performance gains across diverse reasoning tasks. For example, with just 5 GPU hours of training, Qwen-Math-7B-CFT show an average improvement of 15% on six math benchmarks and 16% on three logic reasoning benchmarks. These results are comparable to or even surpass the results from RL with 20x less compute. Ablation studies reveal the robustness of one-shot CFT across different prompt problems. These results highlight one-shot CFT as a simple, general, and compute-efficient approach to unleashing the reasoning capabilities of modern LLMs.

Community

Paper author Paper submitter

We found that Supervised Fine-tuning on ONE problem can achieve similar performance gain as RL on ONE problem with 20x less compute! In the paper, we show that Critique Fine-Tuning on 1 problem can boost the average accuracy of six mathematical benchmarks (MATH-500, AMC, OlympiadBench, etc) by 5-15% across different-sized models. We further test on logic reasoning tasks from BBEH, like causal reasoning, disambiguation, etc and show a similar performance gain of 15%. Therefore, we believe CFT works as a more efficient approach to unleash the hidden reasoning capabilities of the pre-trained LLMs!

Project Website: https://tiger-ai-lab.github.io/One-Shot-CFT/
Github: https://github.com/TIGER-AI-Lab/One-Shot-CFT
HF Models: https://huggingface.co/collections/TIGER-Lab/one-shot-cft-683fbb4d2bcf698dbea8fb21
Dataset: https://huggingface.co/datasets/TIGER-Lab/One-Shot-CFT-Data

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 7

Browse 7 models citing this paper

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.03295 in a Space README.md to link it from this page.

Collections including this paper 2