Qwen-GRPO-Training Model

This is a fine-tuned version of the Qwen model, trained using the GRPO dataset along with the 1.NuminaMath-TIR (For R1 Zero Training) 2.Bespoke-Stratos-17k (For R1 Training) datasets. It is designed for high-performance causal language modeling tasks.

Model Details

Model Type: Causal Language Model (CausalLM)
Datasets Used: GRPO, NuminaMath, Bespoke Stratos
Training Objective: Fine-tuned for general language understanding and specialized knowledge in mathematics, engineering, and technical domains.

Usage

To use this model, you can load it using the transformers library:

Installation

Make sure you have the necessary libraries installed:

pip install transformers torch

Example Code

Here’s a quick example to load and use the model for text generation:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Replace with your Hugging Face repo ID
model_id = "joe-xhedi/Qwen-GRPO-training"

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True,
    padding_side="right"
)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Load the model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16
).to(device)

# Example usage (assuming you have a 'messages' list prepared)
inputs = tokenizer("Your prompt here", return_tensors="pt").to(device)
outputs = model.generate(**inputs)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Model Parameters

Tokenizer: Loads a pre-trained tokenizer specific to this model.
Model: This is the causal language model that has been fine-tuned for high-quality text generation.
Device: The model will automatically run on the GPU if available.

Notes

The trust_remote_code=True flag allows the tokenizer and model to execute code from the repository. Use with caution.
The model uses torch_dtype=torch.bfloat16 for better memory optimization.