--- datasets: - davzoku/moecule-finqa - davzoku/moecule-stock-market-outlook base_model: - unsloth/Llama-3.2-1B-Instruct pipeline_tag: question-answering --- # 🫐 Moecule 2x1B M8 FS

logo

## Model Details This model is a mixture of experts (MoE) using the [RhuiDih/moetify](https://github.com/RhuiDih/moetify) library with various task-specific experts. All relevant expert models, LoRA adapters, and datasets are available at [Moecule Ingredients](https://huggingface.co/collections/davzoku/moecule-ingredients-67dac0e6210eb1d95abc6411). ## Key Features - **Zero Additional Training:** Combine existing domain-specific / task-specific experts into a powerful MoE model without additional training! ## System Requirements | Steps | System Requirements | | ---------------- | ---------------------- | | MoE Creation | > 22.5 GB System RAM | | Inference (fp16) | GPU with > 5.4 GB VRAM | ## MoE Creation To reproduce this model, run the following command: ```shell # git clone moetify fork that fixes dependency issue !git clone -b fix-transformers-4.47.1-FlashA2-dependency --single-branch https://github.com/davzoku/moetify.git !cd moetify && pip install -e . python -m moetify.mix \ --output_dir ./moecule-2x1b-m8-fs \ --model_path unsloth/llama-3.2-1b-Instruct \ --modules mlp q_proj \ --ingredients \ davzoku/finqa_expert_1b \ davzoku/stock_market_expert_1b ``` ## Model Parameters ```shell INFO:root:Stem parameters: 626067456 INFO:root:Experts parameters: 1744830464 INFO:root:Routers parameters: 131072 INFO:root:MOE total parameters (numel): 2371028992 INFO:root:MOE total parameters : 2371028992 INFO:root:MOE active parameters: 2371028992 ``` ## Inference To run an inference with this model, you can use the following code snippet: ```python # git clone moetify fork that fixes dependency issue !git clone -b fix-transformers-4.47.1-FlashA2-dependency --single-branch https://github.com/davzoku/moetify.git !cd moetify && pip install -e . model = AutoModelForCausalLM.from_pretrained(, device_map='auto', trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained() def format_instruction(row): return f"""### Question: {row}""" greedy_generation_config = GenerationConfig( temperature=0.1, top_p=0.75, top_k=40, num_beams=1, max_new_tokens=128, repetition_penalty=1.2 ) input_text = "In what ways did Siemens's debt restructuring on March 06, 2024 reflect its strategic priorities?" formatted_input = format_instruction(input_text) inputs = tokenizer(formatted_input, return_tensors="pt").to('cuda') with torch.no_grad(): outputs = model.generate( input_ids=inputs.input_ids, attention_mask=inputs.attention_mask, generation_config=greedy_generation_config ) generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) print(generated_text) ``` ## The Team - CHOCK Wan Kee - Farlin Deva Binusha DEVASUGIN MERLISUGITHA - GOH Bao Sheng - Jessica LEK Si Jia - Sinha KHUSHI - TENG Kok Wai (Walter) ## References - [Flexible and Effective Mixing of Large Language Models into a Mixture of Domain Experts](https://arxiv.org/abs/2408.17280v2) - [RhuiDih/moetify](https://github.com/RhuiDih/moetify)