Spec-T1-Base-7B

Most advanced reinforcement learning (RL) models in recent open-source research depend on large-scale base models, such as those with 32 billion parameters, particularly for enhancing code reasoning capabilities. Historically, achieving simultaneous advancements in mathematical and coding abilities within smaller models has been deemed challenging. However, we believe that the reasoning potential of RL-trained models is fundamentally tied to the inherent capabilities of their base model. To fully harness the reasoning potential of language models, both pre-training and post-training strategies must be optimized for reasoning tasks.

We present Spec-T1-Base-7B, a model trained from scratch and engineered for good reasoning performance. RL experiments with Spec-T1-Base-7B demonstrate that it outperforms much larger 32B models. Additionally, we developed Spec-T1-RL-7B, an RL-trained model derived from the base model, which achieves superior results in mathematics and code reasoning, comparable to OpenAI’s o1.

We open-source the Spec-T1-7B series, including checkpoints for Spec-T1-Base-7B and Spec-T1-RL-7B, to provide valuable insights for building high-performance reasoning language models for the broader community.

🌟 Highlights

Pre-Training: Designed for Reasoning Excellence
- We developed the DataGen Pipeline, a novel synthetic data generation framework, to produce high-quality, reasoning-focused datasets for mathematics and coding. DataGen employs iterative problem synthesis, context-aware augmentation, and quality-driven refinement to generate diverse, high-density reasoning patterns, complemented by multi-level data filtering to enhance the preprocessing pipeline.
- The pre-training process utilized a three-phase data mixture strategy, with Spec-T1-Base-7B trained on approximately 5 trillion tokens over 5.5 months. This training was conducted using a small cluster of 3–4 Google Tensor Processing Unit (TPU) v4 chips provided through the TPU Cloud, optimized for efficient processing of reasoning-focused data.
- PolyStep Forecast (PSF) was incorporated as an auxiliary training objective, enhancing model performance and inference speed, enabling effective utilization of the pre-training resources with the StreamPulse Accelerator’s computational optimizations.
Post-Training Strategy: Advancing Reasoning Capabilities
- For RL training, we curated 130,000 mathematics and code problems, verified by rule-based evaluators. Each problem was rigorously cleaned and assessed for difficulty to ensure quality. Only rule-based accuracy rewards were used to prevent reward manipulation.
- To address sparse rewards in complex code tasks, we introduced a Tiered Precision Scoring system for code problems. Granular scores for test cases of varying complexity enabled more effective policy optimization.
- A Balanced Cycle Sampling approach for simpler problems was implemented to enhance rollout efficiency and stabilize policy updates in later RL training stages.
RL Infrastructure
- We developed a StreamPulse Accelerator to accelerate RL training and validation, integrating continuous rollout, asynchronous reward computation, and early termination. This achieved 2.29× faster training and 1.96× faster validation.
- Support for PSF was integrated into the Nexlify Inference Framework, enhancing the inference engine’s robustness for RL workflows.

Model Details

The PSF layers of Spec-T1-Base-7B are tuned during pre-training and frozen during RL. With one PSF layer for speculative decoding, the acceptance rate is approximately 90%.

Models are available at https://huggingface.co/SVECTOR-CORPORATION

Model	Description	Download (HuggingFace)
Spec-T1-Base-7B	Base model with good reasoning capabilities	🤗 SVECTOR-CORPORATION/Spec-T1-Base-7B
Spec-T1-RL-7B	RL model trained from base model, matching OpenAI o1 performance	🤗 SVECTOR-CORPORATION/Spec-T1-RL-7B

Evaluation Results*

Benchmark	GPT-4o-0513	Claude-3.5-Sonnet-1022	R1-Distill-Qwen-14B	R1-Distill-Qwen-7B	Spec-T1-Base-7B
General
GPQA Diamond (Pass@1)	49.9	65.0	59.1	49.1	38.2
SuperGPQA (Pass@1)	42.4	48.2	40.6	28.9	40.5
DROP (3-shot F1)	83.7	88.3	85.5	77.0	58.2
MMLU-Pro (EM)	72.6	78.0	68.8	53.5	68.9
IF-Eval (Prompt Strict)	84.3	86.5	78.3	60.5	61.0
Mathematics
MATH-500 (Pass@1)	74.6	78.3	93.9	92.8	75.1
AIME 2024 (Pass@1)	9.3	16.0	69.7	55.5	38.2
AIME 2025 (Pass@1)	11.6	7.4	48.2	38.8	15.5
Code
LiveCodeBench v5 (Pass@1)	32.9	38.9	53.1	37.6	45.8
LiveCodeBench v6 (Pass@1)	30.9	37.2	31.9	23.9	39.3

The reported evaluation scores for Spec-T1-Base-7B are approximations conducted in a controlled environment by the SVECTOR team, without involvement from external entities. Evaluations were performed with temperature=0.6. AIME 2024 and AIME 2025 scores are averaged over 25 repetitions & MATH500 and SuperGPQA are with averaged score of 2 repetitions. LiveCodeBench v5 (20240801-20250201), LiveCodeBench v6 (20250201-20250501), GPQA-Diamond, and IF-Eval scores are with averaged score of 10 repetitions. The Spec-T1-RL-7B model achieved superior performance in mathematics and code reasoning tasks in a single evaluation run, reflecting the robustness of its RL optimization.

Example Script

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("SVECTOR-CORPORATION/Spec-T1-Base-7B", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("SVECTOR-CORPORATION/Spec-T1-Base-7B")

prompt = """
Explain why the sum of the first n odd numbers equals n^2.
"""

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

outputs = model.generate(
    inputs.input_ids,
    max_new_tokens=512,
    temperature=0.6,
    do_sample=True
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Citation

@misc{svector2025spect1,
      title={Spec-T1}, 
      author={{SVECTOR Team}},
      year={2025}
}

Contact

For inquiries, please contact us at [email protected].

SVECTOR-CORPORATION
/

Spec-T1-Base-7B