Llama-3.1-8B-tulu3-mixture-math-reasoning-full-muon

This is a fine-tuned version of Llama 3.1 8B, trained on a mixture of math reasoning tasks using the Tulu3 approach.

Model Details

  • Base Model: Meta-Llama-3.1-8B
  • Architecture: LlamaForCausalLM
  • Parameters: ~8B
  • Training: Fine-tuned with LoRA/QLoRA techniques
  • Checkpoint: 2611
  • Training Configuration:
    • Effective batch size: 128
    • Learning rate: 5e-05
    • Method: Full parameter tuning with Muon optimizer

Model Configuration

  • Vocabulary Size: 128,256
  • Hidden Size: 4096
  • Number of Layers: 32
  • Number of Attention Heads: 32
  • Max Position Embeddings: 131,072
  • RoPE Scaling: Llama3 with factor 8.0

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "pmahdavi/Llama-3.1-8B-tulu3-mixture-math-reasoning-full-muon"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Example usage
prompt = "Solve this math problem: What is 2x + 5 = 11?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Details

This model was fine-tuned using LLaMA-Factory with:

  • Mixed precision training (bfloat16)
  • Gradient checkpointing
  • Custom mixture of math reasoning datasets
  • Tulu3 methodology for instruction following

Limitations

  • This model is designed for mathematical reasoning tasks
  • May not perform as well on general conversation or other domains
  • Inherits the limitations of the base Llama 3.1 model

Citation

If you use this model, please cite the original Llama 3.1 paper and the Tulu3 methodology.

Downloads last month
-
Safetensors
Model size
8.03B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for pmahdavi/Llama-3.1-8B-tulu3-mixture-math-reasoning-full-muon

Finetuned
(1400)
this model