YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Qwen-GRPO-Training Model

This is a fine-tuned version of the Qwen model, trained using the GRPO dataset along with the 1.NuminaMath-TIR (For R1 Zero Training) 2.Bespoke-Stratos-17k (For R1 Training) datasets. It is designed for high-performance causal language modeling tasks.

Model Details

  • Model Type: Causal Language Model (CausalLM)
  • Datasets Used: GRPO, NuminaMath, Bespoke Stratos
  • Training Objective: Fine-tuned for general language understanding and specialized knowledge in mathematics, engineering, and technical domains.

Usage

To use this model, you can load it using the transformers library:

Installation

Make sure you have the necessary libraries installed:

pip install transformers torch

Example Code

Here’s a quick example to load and use the model for text generation:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Replace with your Hugging Face repo ID
model_id = "joe-xhedi/Qwen-GRPO-training"

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True,
    padding_side="right"
)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Load the model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16
).to(device)

# Example usage (assuming you have a 'messages' list prepared)
inputs = tokenizer("Your prompt here", return_tensors="pt").to(device)
outputs = model.generate(**inputs)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Model Parameters

  • Tokenizer: Loads a pre-trained tokenizer specific to this model.
  • Model: This is the causal language model that has been fine-tuned for high-quality text generation.
  • Device: The model will automatically run on the GPU if available.

Notes

  • The trust_remote_code=True flag allows the tokenizer and model to execute code from the repository. Use with caution.
  • The model uses torch_dtype=torch.bfloat16 for better memory optimization.

Downloads last month
3
Safetensors
Model size
494M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support