Satori-7B-SFT is the SFT model checkpoint for training our RL model Satori-7B-Round2. Satori-7B-SFT is only trained with a small-scale format tuning (FT) stage that helps the base LLM to internalize the COAT reasoning format.

Usage


import os
from tqdm import tqdm
import torch
from vllm import LLM, SamplingParams

def generate(question_list,model_path):
    llm = LLM(
        model=model_path,
        trust_remote_code=True,
        tensor_parallel_size=1,
    )
    sampling_params = SamplingParams(
        max_tokens=4096,
        temperature=0.0,
        n=1,
        skip_special_tokens=True # hide special tokens such as "<|continue|>", "<|reflect|>", and "<|explore|>"
    )
    outputs = llm.generate(question_list, sampling_params, use_tqdm=True)
    completions = [[output.text for output in output_item.outputs] for output_item in outputs]
    return completions

def prepare_prompt(question):
    prompt = f"<|im_start|>user\nSolve the following math problem efficiently and clearly.\nPlease reason step by step, and put your final answer within \\boxed{{}}.\nProblem: {question}<|im_end|>\n<|im_start|>assistant\n"
    return prompt
    
def run():
    model_path = "Satori-reasoning/Satori-7B-SFT"
    all_problems = [
        "which number is larger? 9.11 or 9.9?",
    ]
    completions = generate(
        [prepare_prompt(problem_data) for problem_data in all_problems],
        model_path
    )
    
    for completion in completions:
        print(completion[0])
if __name__ == "__main__":
    run()

Resources

We provide our training datasets:

Please refer to our blog and research paper for more technical details of Satori.

For code, see https://github.com/Satori-reasoning/Satori

Citation

If you find our model and data helpful, please cite our paper:

@misc{shen2025satorireinforcementlearningchainofactionthought,
      title={Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search}, 
      author={Maohao Shen and Guangtao Zeng and Zhenting Qi and Zhang-Wei Hong and Zhenfang Chen and Wei Lu and Gregory Wornell and Subhro Das and David Cox and Chuang Gan},
      year={2025},
      eprint={2502.02508},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.02508}, 
}
Downloads last month
48
Safetensors
Model size
7.62B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Satori-reasoning/Satori-SFT-7B

Base model

Qwen/Qwen2.5-7B
Finetuned
(301)
this model
Finetunes
1 model
Quantizations
2 models

Dataset used to train Satori-reasoning/Satori-SFT-7B

Collection including Satori-reasoning/Satori-SFT-7B