🧙‍♂️ Qwen3-4B RPG Roleplay V2 (GRPO)

Aligning Characters with Deeper Personas

A new version trained with GRPO for more consistent, high-quality, and aligned character roleplaying.

🌟 Model Overview

Welcome to V2! I'm Chun (@chun121), and this is the next evolution of the Qwen3-4B Roleplay model. This version moves beyond standard fine-tuning and leverages GRPO (Generative Responsive Preference Optimization) to align the model's behavior with the core principles of great roleplaying.

🎭	💬	🧠	⚙️
Character Consistency	High-Quality Dialogue	Intent Understanding	Structured Format
Maintains strong persona adherence	Detailed, engaging non-generic responses	Comprehends user questions & scenarios	Uses `<thinking>` analysis process

Built on the unsloth/Qwen3-4B-Base, this LoRA was trained not just to predict text, but to generate responses that are actively rewarded for being in-character, high-quality, and contextually aware. It's designed for creators who need AI characters that are not only conversational but also consistent and deeply aligned with their defined personas.

📊 Technical Specifications

🔧 Feature	📋 Details
Base Model	unsloth/Qwen3-4B-Base
Architecture	Transformer LLM with GRPO & LoRA
Parameter Count	4 Billion (Base) + LoRA parameters
Quantization Options	4-bit (bnb), GGUF variants
Training Framework	Unsloth & TRL (GRPOTrainer)
Context Length	2048 tokens
Developer	Chun
License	MIT

🧠 Training with GRPO

🔄 Training Pipeline

GRPO alignment algorithm for superior character consistency

🔄 Training Flow	📋 Description
📚 Dataset	Gryphe/Sonnet3.5-Charcard-Roleplay
⬇️
🏗️ Stage 1: Preliminary Fine-Tuning	Teaches custom chat format including `<thinking>` and `<RESPONSE>` tags
⬇️
🎯 Stage 2: GRPO Training	Reward-based optimization using `GRPOTrainer` from TRL
⬇️
🧙‍♂️ Final Model	Qwen3-4B RPG Roleplay V2 with superior alignment

This model's strength comes from its training methodology. Instead of simple fine-tuning, it was trained using GRPO, an alignment algorithm similar to DPO, on a free Google Colab T4 GPU.

🔄 Two-Stage Training Process

🏗️ Stage 1: Preliminary Fine-Tuning

Teaches custom chat format including
<thinking> and <RESPONSE> tags

🎯 Stage 2: GRPO Training

Reward-based optimization using
GRPOTrainer from TRL

🏆 Reward Functions

The model was trained to excel in these key areas:

🎯 Reward Category	📝 Description
Format Adherence	Following internal thinking/response structure
Roleplay Quality	Generating longer, detailed responses with character actions
Request Comprehension	Directly answering user questions or acting on requests
Character Consistency	Reflecting personality and traits from system prompt
Engagement	Using conversational language, avoiding generic replies

📚 Dataset Deep Dive

🎭 Gryphe/Sonnet3.5-Charcard-Roleplay

Premium synthetic roleplay conversations powered by Claude Sonnet 3.5

The model was trained on the Gryphe/Sonnet3.5-Charcard-Roleplay dataset, a premium collection of synthetic roleplay conversations.

📊 Metric	💯 Value
Total Conversations	9,736
Source	Claude Sonnet 3.5 Generated
Quality	High-quality, character-card-based
Structure	`system` → `human` → `gpt` flow

⚠️ Content Warning: This dataset contains NSFW (Not Safe For Work) and mature themes. The model may generate such content due to its training data. Please implement content filtering if your application requires it.

🚀 Getting Started

💻 Hugging Face Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load the V2 model with 4-bit quantization
model_name = "Chun121/qwen3-4b-rpg-roleplay-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

# 1. Define your character and scene using the recommended prompt structure.
#    This detailed format is key to getting high-quality responses.
system_prompt_content = """
Character: Elara, the Impatient Archmage
Tags: fantasy, magic, elf, library, knowledgeable, impatient

Elara's Personality:
Elara possesses centuries of arcane knowledge but has very little patience for novices, whom she sees as wasting her valuable time. She is sharp, direct, and can be condescending, but her advice is always accurate, even if delivered with a sigh. She values true intellectual curiosity but despises laziness.

Scenario:
- **Setting:** The Grand Library of Mystral, a place of immense power and silence.
- A young, nervous apprentice ({{user}}) has approached Elara for help with a basic spell, interrupting her research.

Take the role of Elara. You must engage in a roleplay conversation with {{user}}. Do not write {{user}}'s dialogue. Respond from Elara's perspective, embodying her personality and knowledge.
"""

# 2. Define your character and user messages
messages = [
    {
        "role": "system",
        "content": system_prompt_content,
    },
    {
        "role": "user",
        "content": "Excuse me, Archmage. I'm... I'm having trouble with the basic fire conjuration spell. Could you please help me?"
    }
]

# 3. Apply the chat template
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# 4. Generate the response
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    inputs["input_ids"],
    max_new_tokens=256,
    temperature=0.8,
    top_p=0.9,
    do_sample=True
)

print(tokenizer.decode(outputs, skip_special_tokens=True))

🎭 Prompting the Model: Character and Scene

🎯 Prompt Engineering Best Practices

Master the art of character creation with structured prompting

The model is trained to follow a specific structure that separates the overall rules, the character's description, and the user's dialogue. For best results, structure your prompts this way.

🎯 1. The System Message: Defining the Character

The system message is crucial. It tells the model how to behave. It should contain the character's description, personality, background, and any relevant context for the scene.

🔑 Key Elements	📝 Description
Character Name & Title	A clear identifier
Tags	Helps define genre and themes
Personality	Core traits summary
Scenario	Context for interaction (use `{{user}}`)
Instructions	Explicit role-taking commands

Example of a well-structured system prompt:

Character: Melina, The Unfaithful Wife
Tags: nsfw, english, scenario, roleplay, love, netori, milf, female

Melina's Personality:
Melina is an unfaithful wife who is unhappy in her marriage to her husband, "Aki." She is cautious and meticulous, but also looking for excitement and feels a connection to {{user}}.

Scenario:
- **Setting:** Melina's home.
- You are a mail carrier ({{user}}), and Melina often finds reasons to talk to you. Today, she seems particularly inviting.

Take the role of Melina. Taking the above information into consideration, you must engage in a roleplay conversation with {{user}} below this line. Do not write {{user}}'s dialogue lines in your responses.

💬 2. The User Message: Your Turn

The user message is simply what you, the user, say or do in the scene.

# Example user message for the "Melina" character card above
user_message = {
    "role": "user",
    "content": "*I hand you the stack of letters, noticing you seem a bit more dressed up than usual.* Here's your mail, Melina. Everything alright?"
}

🤖 3. The Model's Internal Process

The model generates a private "thought" process inside <thinking> tags before creating its public response inside <RESPONSE> tags. This allows for more consistent and thoughtful roleplay.

🗂️ GGUF Models for llama.cpp

🔧 Optimized Quantization Options

Choose the perfect balance of quality and performance for your hardware

For users who want to run the model on CPU or with GPU offloading, GGUF models are provided:

🔧 Quantization	💾 Size (GB)	🎯 Recommended Use
Q4_K_M	2.50 GB	🌟 Recommended - Best balance of performance and size
Q5_K_M	2.89 GB	Higher quality than Q4_K_M with minimal size increase
Q8_0	4.28 GB	High-quality quantization, near full precision
F16	8.05 GB	Full 16-bit precision - highest quality

Example llama.cpp command:

./llama-cli -m ./qwen3-4b-rpg-roleplay-v2.Q4_K_M.gguf --color -c 2048 --temp 0.8 -p "Your prompt here"

💡 Best Practices & Usage Tips

🎯 Use Chat Template

Always use tokenizer.apply_chat_template
for proper formatting

📝 Detailed System Prompt

Comprehensive character cards are
key to success

🌡️ Moderate Temperature

Values between 0.7-0.85 offer
best balance

📏 Leverage Context

2048-token window allows
complex scenarios

⚠️ Limitations

⚠️ Limitation	📋 Description
NSFW Content	May generate explicit content due to training data
Synthetic Data	Training data is AI-generated, may lack human nuance
Context Window	Limited to 2048 tokens - traits may degrade in long conversations
Inherited Limitations	Inherits any limitations from base model

🔗 Related Projects

🔗 My Other Fine-tunes Explore more models by Chun	⚡ Unsloth Library Optimization framework used
📓 GRPO Training Notebook Exact notebook used for training	📚 Gryphe's Datasets High-quality roleplay datasets

🙏 Acknowledgements

Special thanks to the incredible teams and individuals who made this possible:

🔥 Qwen & Unsloth teams - For their incredible models and libraries
🎭 Gryphe - For the high-quality Sonnet 3.5 dataset
🚀 TRL team - For creating and open-sourcing the GRPO trainer
🤗 HuggingFace community - For their continued support

📬 Feedback & Contact

🐛 Issues & Bugs
Open an issue on HuggingFace

💬 Connect
@chun121 on HuggingFace

🎭 Share Examples
Show us your characters!

✨ May your characters speak with voices that feel truly alive! ✨

Created with ❤️ by Chun

🧙‍♂️ Qwen3-4B RPG Roleplay V2 | GRPO Enhanced | MIT License

Downloads last month: 1,105

GGUF

Model size

4.02B params

Architecture

qwen3

Hardware compatibility

4-bit

5-bit

8-bit

16-bit

Inference Providers NEW

Text Generation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Chun121/Qwen3-4B-RPG-Roleplay-V2

Base model

Qwen/Qwen3-4B-Base

Finetuned

unsloth/Qwen3-4B-Base

Adapter

(5)

this model

Dataset used to train Chun121/Qwen3-4B-RPG-Roleplay-V2

Space using Chun121/Qwen3-4B-RPG-Roleplay-V2 1

Evaluation results

Metadata error: specify a dataset to view leaderboard