🫐 Moecule
Collection
Family of MoE using moetify, specialised in various finance-related tasks
•
8 items
•
Updated
This model is a mixture of experts (MoE) using the RhuiDih/moetify library with various task-specific experts. All relevant expert models, LoRA adapters, and datasets are available at Moecule Ingredients.
Steps | System Requirements |
---|---|
MoE Creation | > 22.5 GB System RAM |
Inference (fp16) | GPU with > 5.4 GB VRAM |
To reproduce this model, run the following command:
# git clone moetify fork that fixes dependency issue
!git clone -b fix-transformers-4.47.1-FlashA2-dependency --single-branch https://github.com/davzoku/moetify.git
!cd moetify && pip install -e .
python -m moetify.mix \
--output_dir ./moecule-2x1b-m9-ks \
--model_path unsloth/llama-3.2-1b-Instruct \
--modules mlp q_proj \
--ingredients \
davzoku/kyc_expert_1b \
davzoku/stock_market_expert_1b
INFO:root:Stem parameters: 626067456
INFO:root:Experts parameters: 1744830464
INFO:root:Routers parameters: 131072
INFO:root:MOE total parameters (numel): 2371028992
INFO:root:MOE total parameters : 2371028992
INFO:root:MOE active parameters: 2371028992
To run an inference with this model, you can use the following code snippet:
# git clone moetify fork that fixes dependency issue
!git clone -b fix-transformers-4.47.1-FlashA2-dependency --single-branch https://github.com/davzoku/moetify.git
!cd moetify && pip install -e .
model = AutoModelForCausalLM.from_pretrained(<model-name>, device_map='auto', trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(<model-name>)
def format_instruction(row):
return f"""### Question: {row}"""
greedy_generation_config = GenerationConfig(
temperature=0.1,
top_p=0.75,
top_k=40,
num_beams=1,
max_new_tokens=128,
repetition_penalty=1.2
)
input_text = "In what ways did Siemens's debt restructuring on March 06, 2024 reflect its strategic priorities?"
formatted_input = format_instruction(input_text)
inputs = tokenizer(formatted_input, return_tensors="pt").to('cuda')
with torch.no_grad():
outputs = model.generate(
input_ids=inputs.input_ids,
attention_mask=inputs.attention_mask,
generation_config=greedy_generation_config
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
Base model
meta-llama/Llama-3.2-1B-Instruct