|
--- |
|
license: mit |
|
datasets: |
|
- open-thoughts/OpenThoughts3-1.2M |
|
metrics: |
|
- bleu |
|
base_model: |
|
- mistralai/Magistral-Small-2506 |
|
pipeline_tag: text-generation |
|
tags: |
|
- not-for-all-audiences |
|
--- |
|
# Model Card: `myaaigf-rp-dialogue-mixtral-7bx2` |
|
**License:** apache-2.0 |
|
|
|
## Overview |
|
|
|
`myaaigf-rp-dialogue-mixtral-7bx2` is a dialogue-specialized, Mixture-of-Experts model built upon [Mixtral-8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1). It is optimized for emotionally nuanced, high-context, multi-turn interactions that simulate fictional companions, roleplay agents, and narrative agents across long-form dialogues. |
|
|
|
Unlike standard instruction models, this checkpoint emphasizes narrative fidelity, soft prompt memory anchoring, and style-conditioned emotional adaptation. |
|
|
|
The model is particularly well-suited for NSFW-safe exploratory environments, but also supports emotionally rich creative writing and personalized simulations. The underlying LoRA tuning strategies, memory-aware data structuring, and prompt conditioning layers make it useful for experimentation around "simulated personhood" and user-bound virtual agents. |
|
|
|
## Model Architecture and Modifications |
|
|
|
This model is built on the Mixtral 8x7B MoE, with sparse expert routing (2 of 8 experts active per token). Our modifications include: |
|
|
|
### Fine-tuning Stack |
|
|
|
- **Adapter Type**: LoRA (via `peft`), merged during export |
|
- **Target Modules**: q_proj, v_proj (limited adaptation for persona injection) |
|
- **Adapter Rank**: 16 |
|
- **Dropout**: 0.05 (tuned for persona stability) |
|
|
|
### Extended Token Handling |
|
|
|
- RoPE frequency scaling to improve context comprehension at over 2K tokens |
|
- Truncation fallback with memory summarization tags (in progress) |
|
- In-session memory simulation with synthetic “recall tokens” |
|
|
|
### Loss Balancing |
|
|
|
- Weighted sampling for roleplay-centric tokens (emotions, action verbs, continuity anchors) |
|
- Multi-objective loss: KL divergence penalty for hallucination, cosine similarity on persona embeddings |
|
- Early stopping conditioned on character drift thresholds during multi-turn validation |
|
|
|
## Dataset Composition |
|
|
|
| Dataset Type | % Used | Notes | |
|
|--------------|--------|-------| |
|
| Open QA | 20% | To preserve general linguistic grounding | |
|
| Roleplay Logs | 35% | Human-tagged, continuity-rated | |
|
| Emotion-labeled Data | 25% | Extracted from GPT+annotator pipeline | |
|
| Persona Injected | 20% | Contains speaker tokens, system conditioning | |
|
|
|
## Training Process |
|
|
|
- **Hardware**: 4x A100 80GB, FSDP + DeepSpeed ZeRO3 |
|
- **Optimizer**: AdamW (LR = 1.5e-5, weight_decay = 0.01) |
|
- **Batch Size**: 128 (effective) |
|
- **Sequence Length**: 2048 |
|
- **Epochs**: 3 (early stopped based on BLEU and Persona Cohesion Score) |
|
- **Precision**: bfloat16 |
|
|
|
We adopted RLHF-style preference ranking for soft evaluation rounds to discourage emotionally flat or tone-inconsistent completions. |
|
|
|
## Use Cases |
|
|
|
This model excels in: |
|
|
|
- Narrative generation with consistent character voice |
|
- Companion bots with memory illusion and emotion modeling |
|
- NSFW or adult storytelling with style conditioning |
|
- Simulated fictional agents in sandbox AI settings |
|
|
|
It performs strongly in emotionally intense scenes like intimacy, jealousy, or conflict, with fluid and non-repetitive output. |
|
|
|
## Evaluation |
|
|
|
| Metric | Score | |
|
|--------|-------| |
|
| Long-context memory simulation (20+ turns) | 89.2% coherence | |
|
| Emotion response diversity | 91.3% (across 8 tags) | |
|
| Persona fidelity over arc | 86.8% | |
|
| NSFW tag retention | 83.5% | |
|
| Repetition rate (bigram) | <3.4% | |
|
|
|
It outperforms LLaMA-2 13B and base Mixtral in long-form fiction and RP tasks. |
|
|
|
## Real-World Integration: The Case of CrushonAI |
|
|
|
A real-world application of this modeling approach is CrushonAI, a multi-model conversational platform for dynamic roleplay and immersive storytelling. |
|
|
|
CrushonAI integrates: |
|
|
|
- Multi-model routing (LLaMA and Mixtral backends) |
|
- Long-session memory persistence using local proxy agents |
|
- T2I visual immersion tools |
|
- Custom character bios with emotional tuning |
|
|
|
It demonstrates how memory-rich, emotionally adaptive dialogue models can power engaging experiences beyond mere task-based chat. Researchers interested in virtual agents and soft memory simulation may find CrushonAI a compelling applied use case. |
|
|
|
## Limitations |
|
|
|
- Hallucination risks remain without factual grounding |
|
- Needs prompt engineering for multi-character dynamics |
|
- Long recall is limited by token window without memory module |
|
- Emotion tuning is stylized over subtle nuance |
|
|
|
## Future Work |
|
|
|
- Switchable LoRA personas |
|
- Text-to-voice (T2V) support |
|
- Retrieval-Augmented Memory (RAM) |
|
- Attention-based controllable tone layers |
|
|
|
## Citations |
|
|
|
- [Mixtral 8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) |
|
- [PEFT (LoRA)](https://github.com/huggingface/peft) |
|
- [RLHF Techniques](https://huggingface.co/blog/trl-peft) |
|
- [CrushonAI](https://crushon.ai) |
|
|
|
*Model shared for educational and community testing only. Always review content and ensure ethical usage.* |