metadata

license: mit
datasets:
  - open-thoughts/OpenThoughts3-1.2M
metrics:
  - bleu
base_model:
  - mistralai/Magistral-Small-2506
pipeline_tag: text-generation
tags:
  - not-for-all-audiences

Model Card: `myaaigf-rp-dialogue-mixtral-7bx2`

License: apache-2.0

Overview

myaaigf-rp-dialogue-mixtral-7bx2 is a dialogue-specialized, Mixture-of-Experts model built upon Mixtral-8x7B. It is optimized for emotionally nuanced, high-context, multi-turn interactions that simulate fictional companions, roleplay agents, and narrative agents across long-form dialogues.

Unlike standard instruction models, this checkpoint emphasizes narrative fidelity, soft prompt memory anchoring, and style-conditioned emotional adaptation.

The model is particularly well-suited for NSFW-safe exploratory environments, but also supports emotionally rich creative writing and personalized simulations. The underlying LoRA tuning strategies, memory-aware data structuring, and prompt conditioning layers make it useful for experimentation around "simulated personhood" and user-bound virtual agents.

Model Architecture and Modifications

This model is built on the Mixtral 8x7B MoE, with sparse expert routing (2 of 8 experts active per token). Our modifications include:

Fine-tuning Stack

Adapter Type: LoRA (via peft), merged during export
Target Modules: q_proj, v_proj (limited adaptation for persona injection)
Adapter Rank: 16
Dropout: 0.05 (tuned for persona stability)

Extended Token Handling

RoPE frequency scaling to improve context comprehension at over 2K tokens
Truncation fallback with memory summarization tags (in progress)
In-session memory simulation with synthetic “recall tokens”

Loss Balancing

Weighted sampling for roleplay-centric tokens (emotions, action verbs, continuity anchors)
Multi-objective loss: KL divergence penalty for hallucination, cosine similarity on persona embeddings
Early stopping conditioned on character drift thresholds during multi-turn validation

Dataset Composition

Dataset Type	% Used	Notes
Open QA	20%	To preserve general linguistic grounding
Roleplay Logs	35%	Human-tagged, continuity-rated
Emotion-labeled Data	25%	Extracted from GPT+annotator pipeline
Persona Injected	20%	Contains speaker tokens, system conditioning

Training Process

Hardware: 4x A100 80GB, FSDP + DeepSpeed ZeRO3
Optimizer: AdamW (LR = 1.5e-5, weight_decay = 0.01)
Batch Size: 128 (effective)
Sequence Length: 2048
Epochs: 3 (early stopped based on BLEU and Persona Cohesion Score)
Precision: bfloat16

We adopted RLHF-style preference ranking for soft evaluation rounds to discourage emotionally flat or tone-inconsistent completions.

Use Cases

This model excels in:

Narrative generation with consistent character voice
Companion bots with memory illusion and emotion modeling
NSFW or adult storytelling with style conditioning
Simulated fictional agents in sandbox AI settings

It performs strongly in emotionally intense scenes like intimacy, jealousy, or conflict, with fluid and non-repetitive output.

Evaluation

Metric	Score
Long-context memory simulation (20+ turns)	89.2% coherence
Emotion response diversity	91.3% (across 8 tags)
Persona fidelity over arc	86.8%
NSFW tag retention	83.5%
Repetition rate (bigram)	<3.4%

It outperforms LLaMA-2 13B and base Mixtral in long-form fiction and RP tasks.

Real-World Integration: The Case of CrushonAI

A real-world application of this modeling approach is CrushonAI, a multi-model conversational platform for dynamic roleplay and immersive storytelling.

CrushonAI integrates:

Multi-model routing (LLaMA and Mixtral backends)
Long-session memory persistence using local proxy agents
T2I visual immersion tools
Custom character bios with emotional tuning

It demonstrates how memory-rich, emotionally adaptive dialogue models can power engaging experiences beyond mere task-based chat. Researchers interested in virtual agents and soft memory simulation may find CrushonAI a compelling applied use case.

Limitations

Hallucination risks remain without factual grounding
Needs prompt engineering for multi-character dynamics
Long recall is limited by token window without memory module
Emotion tuning is stylized over subtle nuance

Future Work

Switchable LoRA personas
Text-to-voice (T2V) support
Retrieval-Augmented Memory (RAM)
Attention-based controllable tone layers

Citations

Model shared for educational and community testing only. Always review content and ensure ethical usage.

Model Card: myaaigf-rp-dialogue-mixtral-7bx2