ZTE-AIM
/

NTele-R1-32B-V1

Model card Files Files and versions Community

NTele-R1-32B-V1 / README.md

wandermay's picture

Update README.md

a6630a4 verified 2 months ago

|

history blame contribute delete

1.85 kB

	---
	license: apache-2.0
	---
	## Achieving Superior Performance over Qwen3-32B and QwQ-32B Using Only 800 Strategically Curated Samples


	### Model description
	NTele-R1-32B-V1 is the continuation of [NTele-R1-32B-Preview](https://huggingface.co/ZTE-AIM/NTele-R1-32B-Preview), you can visit for more information. We have made great improvements on the base by using less corpus in mathematics and code (only 800 items, including 400 mathematics and 400 codes), and surpassed the industry's advanced models Qwen3-32B and QwQ-32B.
	\| Model \|Release Date \| AIME2024 \| AIME2025 \| MATH500 \| GPQA-Diamond \| LCB（24.08-25.02） \|
	\|-------\|-------\|-------\|-------\|-------\|-------\|-------\|
	\| DeepSeek-R1-Distill-Qwen-32B \| 25.1.20 \| 64.17 \| 55.21 \| 89.8 \| 62.1 \| 50.26 \|
	\| QwQ-32B \| 25.3.6 \| 76.25 \| 67.30 \| 94.6 \| 63.6 \| 60.94 \|
	\| Qwen3-32B(think) \| 25.4.29 \| 78.75 \| 73.33 \| 95 \| 69.7 \| 53.24 \|
	\| NTele-R1-32B-V1(ours) \| 25.5.10 \| 82.5\| 74.49 \| 95.2 \| 67.17 \| 63.69 \|


	### Data

	[\[🤗 Codemath400\]](https://huggingface.co/datasets/ZTE-AIM/NTele-R1-Data)

	You can access our [dataset](https://huggingface.co/datasets/ZTE-AIM/NTele-R1-Data) to get 800 training data and visit the [NTele-R1-32B-Preview](https://huggingface.co/ZTE-AIM/NTele-R1-32B-Preview) to learn about the data synthesis and screening process.



	### Evaluation
	We evaluate models with [SkyThought](https://github.com/NovaSky-AI/SkyThought).

	### Training Details
	NTele-R1-32B-V1 was trained from DeepSeek-32B-Distill on 8xH800.

	#### Training hyperparameter
	- learning_rate: 1e-05
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 8
	- gradient_accumulation_steps: 6
	- total_train_batch_size: 48
	- total_eval_batch_size: 48
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 10.0