How you approached merging these models?

by Evi1ran - opened 7 days ago

7 days ago

Hello there! 😊
I hope you're doing well. I'm really impressed with your work on combining multiple models into a Mixture of Experts (MoE) — it's quite inspiring!

I was wondering if you'd be kind enough to share how you approached merging these models or how you stacked them together to form the MoE structure. If possible, would you mind sharing some code examples or even just the general idea behind your method? I'd greatly appreciate any insights you could offer!

Thank you so much for taking the time to read this, and I look forward to hearing from you! 🙏

huihui-ai

Owner 7 days ago

https://huggingface.co/huihui-ai/Huihui-MoE-1.3B-A0.6B-abliterated#training

Conversion: The model copies embeddings, self-attention, and normalization weights from Qwen3-0.6B, replacing MLP layers with MoE layers (3 experts). Gating weights are randomly initialized.

huihui-ai

Owner 7 days ago

Architecture: Qwen3MoeForCausalLM model with 4 experts per layer (num_experts=4), activating 1 expert per token (num_experts_per_tok=1).

config.json

huihui-ai

Owner 7 days ago

https://x.com/support_huihui/status/1932972166943027664

huihui-ai

Owner 7 days ago

If the base model supports the MoE (Mixture of Experts) architecture, you can simply convert models with consistent parameters into MoE-type models. For example, this includes Qwen and DeepSeek.

Evi1ran

7 days ago

Thank you so much for taking the time to reply! 😊
I’m truly curious about how you merged these specific models: suayptalha/Qwen3-0.6B-Code-Expert , suayptalha/Qwen3-0.6B-Math-Expert , suayptalha/Qwen3-0.6B-Medical-Expert , and huihui-ai/Qwen3-0.6B-abliterated into a single unified model.

What caught my attention is that the merged version you shared exists as one consolidated file, while the model at https://huggingface.co/suayptalha/Arcana-Qwen3-2.4B-A0.6B appears quite different in structure. This discrepancy feels intriguing, and I’d love to learn more about your approach if possible!

Could you kindly share any insights into the methodology or design choices behind this merging process?

huihui-ai

Owner 7 days ago

You can compare the config.json files of Qwen3-30B-A3B, Qwen3-0.6B, and Huihui-MoE-1B-A0.6B. You should be able to discover some differences or patterns.

huihui-ai

Owner 7 days ago

If you want to merge multiple models from the Qwen3 series using MoE (Mixture of Experts) fusion, we can certainly give it a try.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment