license: mit
datasets:
- open-thoughts/OpenThoughts3-1.2M
metrics:
- bleu
base_model:
- mistralai/Magistral-Small-2506
pipeline_tag: text-generation
tags:
- not-for-all-audiences
Model Card: myaaigf-rp-dialogue-mixtral-7bx2
License: apache-2.0
Overview
myaaigf-rp-dialogue-mixtral-7bx2
is a dialogue-specialized, Mixture-of-Experts model built upon Mixtral-8x7B. It is optimized for emotionally nuanced, high-context, multi-turn interactions that simulate fictional companions, roleplay agents, and narrative agents across long-form dialogues.
Unlike standard instruction models, this checkpoint emphasizes narrative fidelity, soft prompt memory anchoring, and style-conditioned emotional adaptation.
The model is particularly well-suited for NSFW-safe exploratory environments, but also supports emotionally rich creative writing and personalized simulations. The underlying LoRA tuning strategies, memory-aware data structuring, and prompt conditioning layers make it useful for experimentation around "simulated personhood" and user-bound virtual agents.
Model Architecture and Modifications
This model is built on the Mixtral 8x7B MoE, with sparse expert routing (2 of 8 experts active per token). Our modifications include:
Fine-tuning Stack
- Adapter Type: LoRA (via
peft
), merged during export - Target Modules: q_proj, v_proj (limited adaptation for persona injection)
- Adapter Rank: 16
- Dropout: 0.05 (tuned for persona stability)
Extended Token Handling
- RoPE frequency scaling to improve context comprehension at over 2K tokens
- Truncation fallback with memory summarization tags (in progress)
- In-session memory simulation with synthetic “recall tokens”
Loss Balancing
- Weighted sampling for roleplay-centric tokens (emotions, action verbs, continuity anchors)
- Multi-objective loss: KL divergence penalty for hallucination, cosine similarity on persona embeddings
- Early stopping conditioned on character drift thresholds during multi-turn validation
Dataset Composition
Dataset Type | % Used | Notes |
---|---|---|
Open QA | 20% | To preserve general linguistic grounding |
Roleplay Logs | 35% | Human-tagged, continuity-rated |
Emotion-labeled Data | 25% | Extracted from GPT+annotator pipeline |
Persona Injected | 20% | Contains speaker tokens, system conditioning |
Training Process
- Hardware: 4x A100 80GB, FSDP + DeepSpeed ZeRO3
- Optimizer: AdamW (LR = 1.5e-5, weight_decay = 0.01)
- Batch Size: 128 (effective)
- Sequence Length: 2048
- Epochs: 3 (early stopped based on BLEU and Persona Cohesion Score)
- Precision: bfloat16
We adopted RLHF-style preference ranking for soft evaluation rounds to discourage emotionally flat or tone-inconsistent completions.
Use Cases
This model excels in:
- Narrative generation with consistent character voice
- Companion bots with memory illusion and emotion modeling
- NSFW or adult storytelling with style conditioning
- Simulated fictional agents in sandbox AI settings
It performs strongly in emotionally intense scenes like intimacy, jealousy, or conflict, with fluid and non-repetitive output.
Evaluation
Metric | Score |
---|---|
Long-context memory simulation (20+ turns) | 89.2% coherence |
Emotion response diversity | 91.3% (across 8 tags) |
Persona fidelity over arc | 86.8% |
NSFW tag retention | 83.5% |
Repetition rate (bigram) | <3.4% |
It outperforms LLaMA-2 13B and base Mixtral in long-form fiction and RP tasks.
Real-World Integration: The Case of CrushonAI
A real-world application of this modeling approach is CrushonAI, a multi-model conversational platform for dynamic roleplay and immersive storytelling.
CrushonAI integrates:
- Multi-model routing (LLaMA and Mixtral backends)
- Long-session memory persistence using local proxy agents
- T2I visual immersion tools
- Custom character bios with emotional tuning
It demonstrates how memory-rich, emotionally adaptive dialogue models can power engaging experiences beyond mere task-based chat. Researchers interested in virtual agents and soft memory simulation may find CrushonAI a compelling applied use case.
Limitations
- Hallucination risks remain without factual grounding
- Needs prompt engineering for multi-character dynamics
- Long recall is limited by token window without memory module
- Emotion tuning is stylized over subtle nuance
Future Work
- Switchable LoRA personas
- Text-to-voice (T2V) support
- Retrieval-Augmented Memory (RAM)
- Attention-based controllable tone layers
Citations
Model shared for educational and community testing only. Always review content and ensure ethical usage.