ATLAS-8B-Instruct
ATLAS-8B-Instruct is a specialized teaching model developed by Arc Intelligence. It is the result of the first phase—Supervised Fine-Tuning (SFT)—of the ATLAS Framework.
This model serves as the crucial foundation for the final reinforcement learning teacher, ATLAS-8B-Thinking
. It has been trained on the Arc-ATLAS-Teach-v0
dataset to learn the formats and structures of effective pedagogy, including how to generate high-quality reasoning traces, explanations, and solution demonstrations.
Think of this model as having memorized the curriculum; it knows what good teaching looks like. It is the essential starting point before the RL phase teaches it how to adapt that teaching to individual students.
Model's Role in the ATLAS Framework
The ATLAS training pipeline is a two-stage process:
- Phase 1: Supervised Fine-Tuning (SFT) → This is the phase that produces
ATLAS-8B-Instruct
. It learns the core knowledge and teaching formats from a static dataset. - Phase 2: Reinforcement Learning (RL) → This phase takes
ATLAS-8B-Instruct
as its starting point and trains it to become an adaptive teacher, resulting in the finalATLAS-8B-Thinking
model.
This checkpoint is released for researchers who wish to replicate our work, build upon the SFT foundation, or experiment with the second-stage RL training.
How to Use
ATLAS-8B-Instruct
is not a general-purpose chat model. It is designed to generate teaching content based on the structured format used in our dataset.
Basic Generation Example
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"Arc-Intelligence/ATLAS-8B-Instruct",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Arc-Intelligence/ATLAS-8B-Instruct")
# Example prompt following the SFT format
prompt = """Question: A farmer has 52 trees planted in a row over a length of 1850 meters. What is the distance between each tree?
Provide a step-by-step explanation to solve this problem."""
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Continuing to RL Training
This model is the direct input for the second phase of the ATLAS training pipeline. To use this model as the base for RL training, follow the instructions in the main repository.
# In the ATLAS repository, the RL script is configured
# to load an SFT checkpoint like this one.
# Run Phase 2: Reinforcement Learning (RL)
scripts/launch_with_server.sh 1 3 configs/run/teacher_rcl.yaml
Training Details
- Base Model: Qwen/Qwen3-8B
- Training Stage: Supervised Fine-Tuning (SFT) only
- Dataset: Arc-Intelligence/Arc-ATLAS-Teach-v0
- Context Length: 8192 tokens
- Hardware: 4x H100 GPUs
- Precision: BF16
- Framework: DeepSpeed ZeRO-3
Limitations
- Pre-RL Checkpoint: This model has not undergone the reinforcement learning optimization that teaches adaptive teaching. The full performance gains reported in our paper are only realized after the RL phase.
- Domain Scope: Primarily trained on the mathematical and reasoning problems present in the
Arc-ATLAS-Teach-v0
dataset. - Not for Chat: The model is not intended for conversational use and performs best with prompts that match the SFT data format.
Citation
If you use the ATLAS framework or our models in your research, please cite our work:
@misc{barnes2025atlas,
title={{ATLAS: Adaptive Teaching and Learning Alignment System for Reinforcement Learning}},
author={Jarrod Barnes and Aman Jaglan},
year={2025},
publisher={Arc Intelligence},
note={Technical Report},
url={[https://github.com/Arc-Computer/ATLAS](https://github.com/Arc-Computer/ATLAS)}
}
Project Resources
- GitHub Repository: https://github.com/Arc-Computer/ATLAS
- Final RL Model: ATLAS-8B-Thinking
- Training Dataset: Arc-ATLAS-Teach-v0
- Downloads last month
- 80