Non-asymptotic oracle inequalities for the Lasso in high-dimensional mixture of experts Paper • 2009.10622 • Published Sep 22, 2020 • 1
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models Paper • 2401.15947 • Published Jan 29 • 48
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts Paper • 2401.04081 • Published Jan 8 • 70
MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving Paper • 2401.14361 • Published Jan 25 • 2
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference Paper • 2308.12066 • Published Aug 23, 2023 • 4
EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models Paper • 2308.14352 • Published Aug 28, 2023
Experts Weights Averaging: A New General Training Scheme for Vision Transformers Paper • 2308.06093 • Published Aug 11, 2023 • 2
Enhancing the "Immunity" of Mixture-of-Experts Networks for Adversarial Defense Paper • 2402.18787 • Published Feb 29 • 2
CompeteSMoE -- Effective Training of Sparse Mixture of Experts via Competition Paper • 2402.02526 • Published Feb 4 • 3
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding Paper • 2006.16668 • Published Jun 30, 2020 • 3
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts Paper • 2211.15841 • Published Nov 29, 2022 • 7
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer Paper • 1701.06538 • Published Jan 23, 2017 • 4
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models Paper • 2402.01739 • Published Jan 29 • 26
ST-MoE: Designing Stable and Transferable Sparse Expert Models Paper • 2202.08906 • Published Feb 17, 2022 • 2
LocMoE: A Low-overhead MoE for Large Language Model Training Paper • 2401.13920 • Published Jan 25 • 2
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale Paper • 2201.05596 • Published Jan 14, 2022 • 2
Pipeline MoE: A Flexible MoE Implementation with Pipeline Parallelism Paper • 2304.11414 • Published Apr 22, 2023 • 2
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models Paper • 2401.06066 • Published Jan 11 • 42
AMEND: A Mixture of Experts Framework for Long-tailed Trajectory Prediction Paper • 2402.08698 • Published Feb 13 • 2
BASE Layers: Simplifying Training of Large, Sparse Models Paper • 2103.16716 • Published Mar 30, 2021 • 3
DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning Paper • 2106.03760 • Published Jun 7, 2021 • 3
Direct Neural Machine Translation with Task-level Mixture of Experts models Paper • 2310.12236 • Published Oct 18, 2023 • 2
Adaptive Gating in Mixture-of-Experts based Language Models Paper • 2310.07188 • Published Oct 11, 2023 • 2
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy Paper • 2310.01334 • Published Oct 2, 2023 • 3
Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts Paper • 2309.04354 • Published Sep 8, 2023 • 13
Towards More Effective and Economic Sparsely-Activated Model Paper • 2110.07431 • Published Oct 14, 2021 • 2
Taming Sparsely Activated Transformer with Stochastic Experts Paper • 2110.04260 • Published Oct 8, 2021 • 2
Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference Paper • 2110.03742 • Published Sep 24, 2021 • 3
FedJETs: Efficient Just-In-Time Personalization with Federated Mixture of Experts Paper • 2306.08586 • Published Jun 14, 2023 • 1
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts Paper • 2306.04845 • Published Jun 8, 2023 • 4
Balanced Mixture of SuperNets for Learning the CNN Pooling Architecture Paper • 2306.11982 • Published Jun 21, 2023 • 2
MoAI: Mixture of All Intelligence for Large Language and Vision Models Paper • 2403.07508 • Published Mar 12 • 75
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models Paper • 2310.16795 • Published Oct 25, 2023 • 26
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints Paper • 2212.05055 • Published Dec 9, 2022 • 5