Usage
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
model_id = "DeepMount00/Murai-350M-v0.1-beta"
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
t_pipeline = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
device_map="auto",
return_full_text=True,
top_p = 0.95,
top_k = 50,
repetition_penalty=1.2
)
SYSTEM_PROMPT = """Sei un assistente utile."""
TEMPERATURE = 0.1
MAX_NEW_TOKENS = 250
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": """Scrivi una funzione python che somma due numeri"""},
]
conv_template = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
outputs = t_pipeline(
conv_template,
max_new_tokens=MAX_NEW_TOKENS,
do_sample=True,
temperature=TEMPERATURE,
num_return_sequences=1,
)
print(outputs[0]["generated_text"])
Training Details
This model uses a deep architecture optimized for parameter efficiency:
- Pre-norm architecture with RMSNorm
- Grouped Query Attention for memory efficiency
- SwiGLU activation for improved performance
- RoPE position encoding for better length generalization
Citation
@misc{deepmount_llm_2024,
title={Deep LLM: A 350M Parameter Language Model with 42 Layers},
author={MicheleMontebovi},
year={2025},
url={https://huggingface.co/DeepMount00/Murai-350M-v0.1-beta}
}
License
Apache 2.0
- Downloads last month
- 175
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support