ATLAS-8B-Instruct

ATLAS-8B-Instruct is a specialized teaching model developed by Arc Intelligence. It is the result of the first phase—Supervised Fine-Tuning (SFT)—of the ATLAS Framework.

This model serves as the crucial foundation for the final reinforcement learning teacher, ATLAS-8B-Thinking. It has been trained on the Arc-ATLAS-Teach-v0 dataset to learn the formats and structures of effective pedagogy, including how to generate high-quality reasoning traces, explanations, and solution demonstrations.

Think of this model as having memorized the curriculum; it knows what good teaching looks like. It is the essential starting point before the RL phase teaches it how to adapt that teaching to individual students.

Model's Role in the ATLAS Framework

The ATLAS training pipeline is a two-stage process:

Phase 1: Supervised Fine-Tuning (SFT) → This is the phase that produces ATLAS-8B-Instruct. It learns the core knowledge and teaching formats from a static dataset.
Phase 2: Reinforcement Learning (RL) → This phase takes ATLAS-8B-Instruct as its starting point and trains it to become an adaptive teacher, resulting in the final ATLAS-8B-Thinking model.

This checkpoint is released for researchers who wish to replicate our work, build upon the SFT foundation, or experiment with the second-stage RL training.

How to Use

ATLAS-8B-Instruct is not a general-purpose chat model. It is designed to generate teaching content based on the structured format used in our dataset.

Basic Generation Example

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "Arc-Intelligence/ATLAS-8B-Instruct",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Arc-Intelligence/ATLAS-8B-Instruct")

# Example prompt following the SFT format
prompt = """Question: A farmer has 52 trees planted in a row over a length of 1850 meters. What is the distance between each tree?

Provide a step-by-step explanation to solve this problem."""

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Continuing to RL Training

This model is the direct input for the second phase of the ATLAS training pipeline. To use this model as the base for RL training, follow the instructions in the main repository.

# In the ATLAS repository, the RL script is configured
# to load an SFT checkpoint like this one.

# Run Phase 2: Reinforcement Learning (RL)
scripts/launch_with_server.sh 1 3 configs/run/teacher_rcl.yaml

Training Details

Base Model: Qwen/Qwen3-8B
Training Stage: Supervised Fine-Tuning (SFT) only
Dataset: Arc-Intelligence/Arc-ATLAS-Teach-v0
Context Length: 8192 tokens
Hardware: 4x H100 GPUs
Precision: BF16
Framework: DeepSpeed ZeRO-3

Limitations

Pre-RL Checkpoint: This model has not undergone the reinforcement learning optimization that teaches adaptive teaching. The full performance gains reported in our paper are only realized after the RL phase.
Domain Scope: Primarily trained on the mathematical and reasoning problems present in the Arc-ATLAS-Teach-v0 dataset.
Not for Chat: The model is not intended for conversational use and performs best with prompts that match the SFT data format.

Citation

If you use the ATLAS framework or our models in your research, please cite our work:

@misc{barnes2025atlas,
      title={{ATLAS: Adaptive Teaching and Learning Alignment System for Reinforcement Learning}}, 
      author={Jarrod Barnes and Aman Jaglan},
      year={2025},
      publisher={Arc Intelligence},
      note={Technical Report},
      url={[https://github.com/Arc-Computer/ATLAS](https://github.com/Arc-Computer/ATLAS)}
}