|
--- |
|
library_name: transformers |
|
tags: [] |
|
--- |
|
|
|
# Model Card for nano-aha-moment-3b |
|
|
|
See: https://github.com/McGill-NLP/nano-aha-moment |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
This is a 3B parameter language model trained using reinforcement learning to solve mathematical reasoning tasks, specifically the Countdown game. The model is based on Qwen2.5-3B and has been fine-tuned with GRPO using nanoAhaMoment codebase. |
|
|
|
- **Developed by:** McGill-NLP Lab |
|
- **Model type:** Causal Language Model |
|
- **Language(s) (NLP):** English |
|
- **License:** MIT |
|
- **Finetuned from model:** Qwen/Qwen2.5-3B |
|
|
|
### Model Sources |
|
|
|
- **Repository:** https://github.com/McGill-NLP/nano-aha-moment |
|
- **Demo:** Available in the repository's checkpoint playground notebook |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
The model is designed to solve mathematical reasoning tasks, specifically the Countdown game where it needs to create equations using a set of numbers to reach a target value. The model shows its reasoning process in `<think>` tags and provides the final answer in `<answer>` tags. |
|
|
|
You can interactively test the model's reasoning capabilities using the [checkpoint playground notebook](https://github.com/McGill-NLP/nano-aha-moment/blob/main/notebooks/checkpoint_playground.ipynb) in the repository. |
|
|
|
### Out-of-Scope Use |
|
|
|
The model is specifically trained for mathematical reasoning tasks and may not perform well on general language tasks or other domains outside its training scope. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
The model has been trained on a specific mathematical reasoning task and may have limitations in: |
|
1. General language understanding and generation |
|
2. Handling complex mathematical problems outside the Countdown game format |
|
3. Maintaining consistent reasoning across different problem types |
|
|
|
### Recommendations |
|
|
|
Users should: |
|
1. Use the model specifically for the Countdown game task it was trained on |
|
2. Be aware of the model's focus on mathematical reasoning |
|
3. Consider the model's limitations when applying it to other tasks |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
The model was trained on the Countdown-Tasks-3to4 dataset, which contains problem statements for the Countdown game where the goal is to reach a target number using a set of available numbers and basic arithmetic operations. |
|
|
|
### Training Procedure |
|
|
|
#### Preprocessing |
|
|
|
The training data was preprocessed to include: |
|
- System message for reasoning guidance |
|
- Structured prompt template for the Countdown game |
|
- Special tags for reasoning steps and answers |
|
|
|
#### Training Hyperparameters |
|
|
|
- **Training regime:** bf16 mixed precision |
|
- **Learning rate:** 1e-6 |
|
- **Batch size:** 64 episodes per iteration |
|
- **Optimizer:** AdamW |
|
- **KL coefficient:** 0.001 |
|
- **Temperature:** 1.0 |
|
|
|
## Technical Specifications |
|
|
|
### Model Architecture and Objective |
|
|
|
The model is based on the Qwen2.5-3B architecture and uses: |
|
- Flash Attention 2 for efficient attention computation |
|
- DeepSpeed ZeRO Stage 2 for memory optimization |
|
- vLLM for efficient inference |
|
|
|
### Compute Infrastructure |
|
|
|
#### Software |
|
|
|
- PyTorch 2.5.1 |
|
- Transformers 4.48.3 |
|
- DeepSpeed 0.16.4 |
|
- vLLM 0.7.3 |
|
- Flash Attention 2.7.2 |
|
|
|
## Citation |
|
|
|
**BibTeX:** |
|
```bibtex |
|
@misc{Kazemnejad2025:NanoAhaMoment, |
|
author = {Amirhossein Kazemnejad and Milad Aghajohari and Alessandro Sordoni and Aaron Courville and Siva Reddy}, |
|
title = {Nano Aha! Moment: Single File "RL for LLM" Library}, |
|
year = {2025}, |
|
howpublished = {\url{https://github.com/McGill-NLP/nano-aha-moment}}, |
|
note = {GitHub repository} |
|
} |
|
``` |
|
|
|
## Model Card Authors |
|
|
|
McGill-NLP Lab |
|
|
|
## Model Card Contact |
|
|
|
For questions about this model card, please contact the McGill-NLP Lab. |