A newer version of this model is available: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

Model Card for OpenAI GSM8K Dataset Enhanced with Reasoning

This model is fine-tuned to answer questions based on the OpenAI GSM8K dataset enhanced with reasoning provided from Deepseek R1.

Invoke notebook shared here, a publicly available Colab notebook for tests.


Model Details

Model Description

This is a transformer-based question-answering model fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. It was trained on a dataset derived from the OpenAI GSM8K benchmark, enhanced with chain-of-thought reasoning to encourage intermediate logical steps. The dataset pairs math word problems with structured answers, using <think>...</think> and <answer>...</answer> tags.

  • Developed by: Yiqiao Yin
  • Model type: Causal Language Model (fine-tuned for Q&A with reasoning)
  • Language(s): English
  • License: MIT
  • Finetuned from model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

Training Configuration

  • 🖥️ Hardware: Trained on a RunPod instance with:
    • 🔥 6 × NVIDIA H100 PCIe GPUs
    • 🧠 144 vCPUs
    • 🧮 1132 GB system RAM
    • 💽 20 GB disk per GPU
  • 🐳 Container Image: runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
  • ⏱️ Total Training Time: 2 hours
  • 💸 Cost: ~$14/hour × 2 hours = $28 USD
  • ⚙️ Zero Redundancy Optimization: DeepSpeed Stage 1
  • 🎯 Precision: FP16 mixed-precision training

Performance

  • Mean token-level accuracy: 97%
  • Evaluation based on in-training token match accuracy over the formatted <think>...</think><answer>...</answer> structure.
  • Model demonstrates strong reasoning capability in multi-step arithmetic and logic problems.

Inference Format

To generate accurate completions, prompt the model in the following structure:

<question>Question: If Sally has 3 apples and buys 2 more, how many does she have in total?</question>

Be aware that this token </question> will prompt the answer to start with <think> which is trained into the model based on training data.

The model will continue reasoning within <think>...</think> and provide a final answer inside <answer>...</answer>.


Intended Use

This model is intended for educational and research purposes in:

  • Chain-of-thought prompting
  • Math reasoning and logical inference
  • Question-answering with intermediate steps

Limitations

  • Trained on structured synthetic data — real-world generalization may vary
  • Best performance achieved when following the exact inference format
  • Does not support multilingual inputs

Citation

If you use this model, please cite:

@misc{yin2024gsm8k,
  author = {Yiqiao Yin},
  title = {TBD},
  year = 2025,
  note = {TBD}
}

Model Card Contact

Author: Yiqiao Yin Connect with me on LinkedIn

Downloads last month
17
Safetensors
Model size
1.78B params
Tensor type
FP16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train eagle0504/finetuned-deepseek-r1-distill-qwen-1.5b-by-openai-gsm8k-enhanced