Model Card for OpenAI GSM8K Dataset Enhanced with Reasoning
This model is fine-tuned to answer questions based on the OpenAI GSM8K dataset enhanced with reasoning provided from Deepseek R1.
Invoke notebook shared here, a publicly available Colab notebook for tests.
Model Details
Model Description
This is a transformer-based question-answering model fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
. It was trained on a dataset derived from the OpenAI GSM8K benchmark, enhanced with chain-of-thought reasoning to encourage intermediate logical steps. The dataset pairs math word problems with structured answers, using <think>...</think>
and <answer>...</answer>
tags.
- Developed by: Yiqiao Yin
- Model type: Causal Language Model (fine-tuned for Q&A with reasoning)
- Language(s): English
- License: MIT
- Finetuned from model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
Training Configuration
- 🖥️ Hardware: Trained on a RunPod instance with:
- 🔥 6 × NVIDIA H100 PCIe GPUs
- 🧠 144 vCPUs
- 🧮 1132 GB system RAM
- 💽 20 GB disk per GPU
- 🐳 Container Image:
runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
- ⏱️ Total Training Time: 2 hours
- 💸 Cost: ~$14/hour × 2 hours = $28 USD
- ⚙️ Zero Redundancy Optimization: DeepSpeed Stage 1
- 🎯 Precision: FP16 mixed-precision training
Performance
- Mean token-level accuracy: 97%
- Evaluation based on in-training token match accuracy over the formatted
<think>...</think><answer>...</answer>
structure. - Model demonstrates strong reasoning capability in multi-step arithmetic and logic problems.
Inference Format
To generate accurate completions, prompt the model in the following structure:
<question>Question: If Sally has 3 apples and buys 2 more, how many does she have in total?</question>
Be aware that this token </question>
will prompt the answer to start with <think>
which is trained into the model based on training data.
The model will continue reasoning within <think>...</think>
and provide a final answer inside <answer>...</answer>
.
Intended Use
This model is intended for educational and research purposes in:
- Chain-of-thought prompting
- Math reasoning and logical inference
- Question-answering with intermediate steps
Limitations
- Trained on structured synthetic data — real-world generalization may vary
- Best performance achieved when following the exact inference format
- Does not support multilingual inputs
Citation
If you use this model, please cite:
@misc{yin2024gsm8k,
author = {Yiqiao Yin},
title = {TBD},
year = 2025,
note = {TBD}
}
Model Card Contact
Author: Yiqiao Yin Connect with me on LinkedIn
- Downloads last month
- 17