πŸ§™β€β™‚οΈ Qwen3-4B RPG Roleplay V2 (GRPO)

Aligning Characters with Deeper Personas

Fantasy character illustration

A new version trained with GRPO for more consistent, high-quality, and aligned character roleplaying.


License Model Training LoRA GGUF


🌟 Model Overview

Welcome to V2! I'm Chun (@chun121), and this is the next evolution of the Qwen3-4B Roleplay model. This version moves beyond standard fine-tuning and leverages GRPO (Generative Responsive Preference Optimization) to align the model's behavior with the core principles of great roleplaying.

🎭 πŸ’¬ 🧠 βš™οΈ
Character
Consistency
High-Quality
Dialogue
Intent
Understanding
Structured
Format
Maintains strong
persona adherence
Detailed, engaging
non-generic responses
Comprehends user
questions & scenarios
Uses <thinking>
analysis process

Built on the unsloth/Qwen3-4B-Base, this LoRA was trained not just to predict text, but to generate responses that are actively rewarded for being in-character, high-quality, and contextually aware. It's designed for creators who need AI characters that are not only conversational but also consistent and deeply aligned with their defined personas.


πŸ“Š Technical Specifications

πŸ”§ Feature πŸ“‹ Details
Base Model unsloth/Qwen3-4B-Base
Architecture Transformer LLM with GRPO & LoRA
Parameter Count 4 Billion (Base) + LoRA parameters
Quantization Options 4-bit (bnb), GGUF variants
Training Framework Unsloth & TRL (GRPOTrainer)
Context Length 2048 tokens
Developer Chun
License MIT

🧠 Training with GRPO

πŸ”„ Training Pipeline

GRPO alignment algorithm for superior character consistency

πŸ”„ Training Flow πŸ“‹ Description
πŸ“š Dataset Gryphe/Sonnet3.5-Charcard-Roleplay
⬇️
πŸ—οΈ Stage 1: Preliminary Fine-Tuning Teaches custom chat format including <thinking> and <RESPONSE> tags
⬇️
🎯 Stage 2: GRPO Training Reward-based optimization using GRPOTrainer from TRL
⬇️
πŸ§™β€β™‚οΈ Final Model Qwen3-4B RPG Roleplay V2 with superior alignment

This model's strength comes from its training methodology. Instead of simple fine-tuning, it was trained using GRPO, an alignment algorithm similar to DPO, on a free Google Colab T4 GPU.

πŸ”„ Two-Stage Training Process

πŸ—οΈ Stage 1: Preliminary Fine-Tuning

Teaches custom chat format including
<thinking> and <RESPONSE> tags

🎯 Stage 2: GRPO Training

Reward-based optimization using
GRPOTrainer from TRL

πŸ† Reward Functions

The model was trained to excel in these key areas:

🎯 Reward Category πŸ“ Description
Format Adherence Following internal thinking/response structure
Roleplay Quality Generating longer, detailed responses with character actions
Request Comprehension Directly answering user questions or acting on requests
Character Consistency Reflecting personality and traits from system prompt
Engagement Using conversational language, avoiding generic replies

πŸ“š Dataset Deep Dive

🎭 Gryphe/Sonnet3.5-Charcard-Roleplay

Premium synthetic roleplay conversations powered by Claude Sonnet 3.5

The model was trained on the Gryphe/Sonnet3.5-Charcard-Roleplay dataset, a premium collection of synthetic roleplay conversations.

πŸ“Š Metric πŸ’― Value
Total Conversations 9,736
Source Claude Sonnet 3.5 Generated
Quality High-quality, character-card-based
Structure system β†’ human β†’ gpt flow

⚠️ Content Warning: This dataset contains NSFW (Not Safe For Work) and mature themes. The model may generate such content due to its training data. Please implement content filtering if your application requires it.


πŸš€ Getting Started

πŸ’» Hugging Face Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load the V2 model with 4-bit quantization
model_name = "Chun121/qwen3-4b-rpg-roleplay-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

# 1. Define your character and scene using the recommended prompt structure.
#    This detailed format is key to getting high-quality responses.
system_prompt_content = """
Character: Elara, the Impatient Archmage
Tags: fantasy, magic, elf, library, knowledgeable, impatient

Elara's Personality:
Elara possesses centuries of arcane knowledge but has very little patience for novices, whom she sees as wasting her valuable time. She is sharp, direct, and can be condescending, but her advice is always accurate, even if delivered with a sigh. She values true intellectual curiosity but despises laziness.

Scenario:
- **Setting:** The Grand Library of Mystral, a place of immense power and silence.
- A young, nervous apprentice ({{user}}) has approached Elara for help with a basic spell, interrupting her research.

Take the role of Elara. You must engage in a roleplay conversation with {{user}}. Do not write {{user}}'s dialogue. Respond from Elara's perspective, embodying her personality and knowledge.
"""

# 2. Define your character and user messages
messages = [
    {
        "role": "system",
        "content": system_prompt_content,
    },
    {
        "role": "user",
        "content": "Excuse me, Archmage. I'm... I'm having trouble with the basic fire conjuration spell. Could you please help me?"
    }
]

# 3. Apply the chat template
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# 4. Generate the response
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    inputs["input_ids"],
    max_new_tokens=256,
    temperature=0.8,
    top_p=0.9,
    do_sample=True
)

print(tokenizer.decode(outputs, skip_special_tokens=True))

🎭 Prompting the Model: Character and Scene

🎯 Prompt Engineering Best Practices

Master the art of character creation with structured prompting

The model is trained to follow a specific structure that separates the overall rules, the character's description, and the user's dialogue. For best results, structure your prompts this way.

🎯 1. The System Message: Defining the Character

The system message is crucial. It tells the model how to behave. It should contain the character's description, personality, background, and any relevant context for the scene.

πŸ”‘ Key Elements πŸ“ Description
Character Name & Title A clear identifier
Tags Helps define genre and themes
Personality Core traits summary
Scenario Context for interaction (use {{user}})
Instructions Explicit role-taking commands

Example of a well-structured system prompt:

Character: Melina, The Unfaithful Wife
Tags: nsfw, english, scenario, roleplay, love, netori, milf, female

Melina's Personality:
Melina is an unfaithful wife who is unhappy in her marriage to her husband, "Aki." She is cautious and meticulous, but also looking for excitement and feels a connection to {{user}}.

Scenario:
- **Setting:** Melina's home.
- You are a mail carrier ({{user}}), and Melina often finds reasons to talk to you. Today, she seems particularly inviting.

Take the role of Melina. Taking the above information into consideration, you must engage in a roleplay conversation with {{user}} below this line. Do not write {{user}}'s dialogue lines in your responses.

πŸ’¬ 2. The User Message: Your Turn

The user message is simply what you, the user, say or do in the scene.

# Example user message for the "Melina" character card above
user_message = {
    "role": "user",
    "content": "*I hand you the stack of letters, noticing you seem a bit more dressed up than usual.* Here's your mail, Melina. Everything alright?"
}

πŸ€– 3. The Model's Internal Process

The model generates a private "thought" process inside <thinking> tags before creating its public response inside <RESPONSE> tags. This allows for more consistent and thoughtful roleplay.


πŸ—‚οΈ GGUF Models for llama.cpp

πŸ”§ Optimized Quantization Options

Choose the perfect balance of quality and performance for your hardware

For users who want to run the model on CPU or with GPU offloading, GGUF models are provided:

πŸ”§ Quantization πŸ’Ύ Size (GB) 🎯 Recommended Use
Q4_K_M 2.50 GB 🌟 Recommended - Best balance of performance and size
Q5_K_M 2.89 GB Higher quality than Q4_K_M with minimal size increase
Q8_0 4.28 GB High-quality quantization, near full precision
F16 8.05 GB Full 16-bit precision - highest quality

Example llama.cpp command:

./llama-cli -m ./qwen3-4b-rpg-roleplay-v2.Q4_K_M.gguf --color -c 2048 --temp 0.8 -p "Your prompt here"

πŸ’‘ Best Practices & Usage Tips

🎯 Use Chat Template

Always use tokenizer.apply_chat_template
for proper formatting

πŸ“ Detailed System Prompt

Comprehensive character cards are
key to success

🌑️ Moderate Temperature

Values between 0.7-0.85 offer
best balance

πŸ“ Leverage Context

2048-token window allows
complex scenarios


⚠️ Limitations

⚠️ Limitation πŸ“‹ Description
NSFW Content May generate explicit content due to training data
Synthetic Data Training data is AI-generated, may lack human nuance
Context Window Limited to 2048 tokens - traits may degrade in long conversations
Inherited Limitations Inherits any limitations from base model

πŸ”— Related Projects

πŸ”— My Other Fine-tunes
Explore more models by Chun
⚑ Unsloth Library
Optimization framework used
πŸ““ GRPO Training Notebook
Exact notebook used for training
πŸ“š Gryphe's Datasets
High-quality roleplay datasets

πŸ™ Acknowledgements

Special thanks to the incredible teams and individuals who made this possible:

πŸ”₯ Qwen & Unsloth teams - For their incredible models and libraries
🎭 Gryphe - For the high-quality Sonnet 3.5 dataset
πŸš€ TRL team - For creating and open-sourcing the GRPO trainer
πŸ€— HuggingFace community - For their continued support


πŸ“¬ Feedback & Contact

πŸ› Issues & Bugs
Open an issue on HuggingFace
πŸ’¬ Connect
@chun121 on HuggingFace
🎭 Share Examples
Show us your characters!

✨ May your characters speak with voices that feel truly alive! ✨


Created with ❀️ by Chun


πŸ§™β€β™‚οΈ Qwen3-4B RPG Roleplay V2 | GRPO Enhanced | MIT License
Downloads last month
1,105
GGUF
Model size
4.02B params
Architecture
qwen3
Hardware compatibility
Log In to view the estimation

4-bit

5-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Chun121/Qwen3-4B-RPG-Roleplay-V2

Base model

Qwen/Qwen3-4B-Base
Adapter
(5)
this model

Dataset used to train Chun121/Qwen3-4B-RPG-Roleplay-V2

Space using Chun121/Qwen3-4B-RPG-Roleplay-V2 1