Is it the SFT model or the CogPO model in your paper?
Is it the SFT model or the CogPO model in your paper?
I have read the relevant paper: https://arxiv.org/abs/2504.09802
In Figure 2 of the paper, this work is explained as consisting of two parts:
First, the CRV system is used to screen and synthesize corpus for SFT (3.2.3 Verifier, first stage of CRV distillation training via SFT).
Then DPO training with dynamic β adjustment is used (3.3.1 Preliminaries, called CogPO).
However, in other information sources (https://xie.infoq.cn/article/e2c16ddd50acb5482502a4298), I noticed that the model published here is called "Finally, the Qwen2.5 series base model is supervised fine-tuned (SFT) using the aligned fast thinking CoT to obtain the DistilQwen2.5-DS3-0324 series model.(最终使用对齐后的快思考 CoT 对 Qwen2.5 系列基座小模型进行监督微调(SFT),得到 DistilQwen2.5-DS3-0324 系列模型。)"
So, I have this question: Is the model published here the SFT model or the CogPO model in your paper?