aldigobbler
/

smollmv2-360Mx4E-MoE-v0.1-unaligned-gates

Model card Files Files and versions

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Highly experimental proper MoE

!! THE GATE IS NOT ALIGNED FULLY !!

Based off of smollmv2. (Llama) MoE-ified then further trained on a general dataset.

info:

MoE layers: [8, 12, 16, 20, 24, 28] 
Top-k: 2 (activates 50.0% of experts per token) 
Hidden size: 960 
Total parameters: 494,554,560 
Trainable parameters: 494,554,560 
Auxiliary loss weight: 0.01

training loss: Total Loss = 6.4659, LM Loss = 5.9851, Aux Loss = 48.0835

val loss: Total Loss: 0.8298, LM Loss: 0.7697, Aux Loss: 6.0092

Downloads last month: 3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including aldigobbler/smollmv2-360Mx4E-MoE-v0.1-unaligned-gates

MoE Experiments (proper sparse MoEs)

2 items • Updated May 27