HROM-M1
HROM-M1 is a transformer-based Mixture-of-Experts (MoE) language model built entirely in PyTorch by me, Timur Hromek, a 15-year-old self-taught developer. It's designed for multi-turn, persona-aware dialogue with a focus on safety, modularity, and extensibility.
This implementation includes top-k expert routing, rotary position embeddings, SwiGLU activations, and a custom tokenizer, along with built-in safety filters and checkpoint management.
Features
- Mixture-of-Experts (MoE) with 8 experts and top-2 routing per token.
- Transformer architecture with 8 layers, 8 heads, and RoPE (rotary positional embeddings).
- SwiGLU activation for efficient MLP computation.
- Multi-dataset training support, including:
daily_dialog
empathetic_dialogues
blended_skill_talk
persona-chat
papahawk/conversational-01
- Custom tokenizer using Byte-Pair Encoding (BPE).
SafetyManager
for blocking unsafe generations using token-level filtering.CheckpointManager
with rotating save slots and auto-recovery.- AMP (mixed precision) and gradient accumulation support.
Model Specs
Hyperparameter | Value |
---|---|
Model Parameters | 370.46M |
Embedding Size (dim) | 768 |
Layers | 8 |
Attention Heads | 8 |
Expert FF Dim | 2048 |
Number of Experts | 8 |
Top-k Experts | 2 |
Vocabulary Size | 32,000 |
Max Sequence Length | 512 tokens |
Dropout | 0.1 |
Batch Size | 16 |
Learning Rate | 2e-5 |
Optimizer | AdamW |
Epochs | 30 |
Grad Accumulation Steps | 8 |
Architecture Overview
HROMBlock
: Transformer block with attention and MoE feedforward.MoELayer
: Routes tokens to top-k experts and applies load balancing loss.Expert
: Lightweight FFN with SwiGLU nonlinearity.SafetyManager
: Filters generations using predefined token patterns.TokenizerTrainer
: Builds a BPE tokenizer from dialogue data.CheckpointManager
: Rotates and auto-recovers checkpoints.
Safety
The model includes a basic content filter that blocks sequences containing unsafe keywords by checking token IDs. Unsafe generations are interrupted before completion.
Installation
git clone https://github.com/yourusername/HROM-M1.git
cd HROM-M1
pip install -r requirements.txt
Training
python HROM-M1.py
The tokenizer will auto-train if not found. Dialogue datasets are pulled via HuggingFace datasets
.Dialogue datasets are pulled via HuggingFace datasets
.