Papers
arxiv:2505.20225

FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models

Published on May 26
· Submitted by yuzc19 on May 27
Authors:
,
,

Abstract

FLAME-MoE is an open-source research suite for MoE architectures in LLMs, providing tools to investigate scaling, routing, and expert behavior with reproducible experiments.

AI-generated summary

Recent large language models such as Gemini-1.5, DeepSeek-V3, and Llama-4 increasingly adopt Mixture-of-Experts (MoE) architectures, which offer strong efficiency-performance trade-offs by activating only a fraction of the model per token. Yet academic researchers still lack a fully open, end-to-end MoE platform for investigating scaling, routing, and expert behavior. We release FLAME-MoE, a completely open-source research suite composed of seven decoder-only models, ranging from 38M to 1.7B active parameters, whose architecture--64 experts with top-8 gating and 2 shared experts--closely reflects modern production LLMs. All training data pipelines, scripts, logs, and checkpoints are publicly available to enable reproducible experimentation. Across six evaluation tasks, FLAME-MoE improves average accuracy by up to 3.4 points over dense baselines trained with identical FLOPs. Leveraging full training trace transparency, we present initial analyses showing that (i) experts increasingly specialize on distinct token subsets, (ii) co-activation matrices remain sparse, reflecting diverse expert usage, and (iii) routing behavior stabilizes early in training. All code, training logs, and model checkpoints are available at https://github.com/cmu-flame/FLAME-MoE.

Community

Paper submitter

🔥Introducing FLAME-MoE: a fully open platform for Mixture-of-Experts (MoE) research. All code, data, checkpoints, training logs, and eval results are public—across 7 model sizes (38M–1.7B active params). Reproducible. Transparent. Scalable.

Paper: https://arxiv.org/abs/2505.20225
Code: https://github.com/cmu-flame/FLAME-MoE

Sign up or log in to comment

Models citing this paper 7

Browse 7 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2505.20225 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2505.20225 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.