OpenCharacter: Training Customizable Role-Playing LLMs with Large-Scale Synthetic Personas
Abstract
Customizable role-playing in large language models (LLMs), also known as character generalization, is gaining increasing attention for its versatility and cost-efficiency in developing and deploying role-playing dialogue agents. This study explores a large-scale data synthesis approach to equip LLMs with character generalization capabilities. We begin by synthesizing large-scale character profiles using personas from Persona Hub and then explore two strategies: response rewriting and response generation, to create character-aligned instructional responses. To validate the effectiveness of our synthetic instruction tuning data for character generalization, we perform supervised fine-tuning (SFT) using the LLaMA-3 8B model. Our best-performing model strengthens the original LLaMA-3 8B Instruct model and achieves performance comparable to GPT-4o models on role-playing dialogue. We release our synthetic characters and instruction-tuning dialogues to support public research.
Community
We study customizable role-playing LLMs with novel synthetic character and dialogue data. We released our data here: https://huggingface.co/datasets/xywang1/OpenCharacter
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Aligning Instruction Tuning with Pre-training (2025)
- CharacterBox: Evaluating the Role-Playing Capabilities of LLMs in Text-Based Virtual Worlds (2024)
- OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios (2025)
- HammerBench: Fine-Grained Function-Calling Evaluation in Real Mobile Device Scenarios (2024)
- SweetieChat: A Strategy-Enhanced Role-playing Framework for Diverse Scenarios Handling Emotional Support Agent (2024)
- Curriculum-style Data Augmentation for LLM-based Metaphor Detection (2024)
- Reasoning-Enhanced Self-Training for Long-Form Personalized Text Generation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper