YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
RLT Qwen 7B Reasoning Teacher
Reinforcement Learning Teacher model based on Qwen2.5-7B-Instruct-AWQ.
Training Results
- Training Steps: 30/30 completed
- Final Loss: -21.89
- Training Time: 43 minutes
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained('hiroshij/rlt-qwen-7b-reasoning-teacher')
model = AutoModelForCausalLM.from_pretrained('hiroshij/rlt-qwen-7b-reasoning-teacher')
- Downloads last month
- 6
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support