Model Card for OpenAI GSM8K Dataset Enhanced with Reasoning

This model is fine-tuned to answer questions based on the OpenAI GSM8K dataset enhanced with reasoning provided from Deepseek R1.

Invoke notebook shared here, a publicly available Colab notebook for tests.

Model Details

Model Description

This is a transformer-based question-answering model fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. It was trained on a dataset derived from the OpenAI GSM8K benchmark, enhanced with chain-of-thought reasoning to encourage intermediate logical steps. The dataset pairs math word problems with structured answers, using <think>...</think> and <answer>...</answer> tags.

Developed by: Yiqiao Yin
Model type: Causal Language Model (fine-tuned for Q&A with reasoning)
Language(s): English
License: MIT
Finetuned from model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

Training Configuration

🖥️ Hardware: Trained on a RunPod instance with:
- 🔥 6 × NVIDIA H100 PCIe GPUs
- 🧠 144 vCPUs
- 🧮 1132 GB system RAM
- 💽 20 GB disk per GPU
🐳 Container Image: runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
⏱️ Total Training Time: 2 hours
💸 Cost: ~$14/hour × 2 hours = $28 USD
⚙️ Zero Redundancy Optimization: DeepSpeed Stage 1
🎯 Precision: FP16 mixed-precision training

Performance

Mean token-level accuracy: 97%
Evaluation based on in-training token match accuracy over the formatted <think>...</think><answer>...</answer> structure.
Model demonstrates strong reasoning capability in multi-step arithmetic and logic problems.

Inference Format

To generate accurate completions, prompt the model in the following structure:

<question>Question: If Sally has 3 apples and buys 2 more, how many does she have in total?</question>

Be aware that this token </question> will prompt the answer to start with <think> which is trained into the model based on training data.

The model will continue reasoning within <think>...</think> and provide a final answer inside <answer>...</answer>.

Intended Use

This model is intended for educational and research purposes in:

Chain-of-thought prompting
Math reasoning and logical inference
Question-answering with intermediate steps

Limitations

Trained on structured synthetic data — real-world generalization may vary
Best performance achieved when following the exact inference format
Does not support multilingual inputs

Citation

If you use this model, please cite:

@misc{yin2024gsm8k,
  author = {Yiqiao Yin},
  title = {TBD},
  year = 2025,
  note = {TBD}
}

Model Card Contact

Author: Yiqiao Yin Connect with me on LinkedIn

eagle0504
/

finetuned-deepseek-r1-distill-qwen-1.5b-by-openai-gsm8k-enhanced