license: mit language: - en base_model: - Qwen/Qwen1.5-1.8B - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B library_name: transformers tags: - mergekit - merged-model - qwen - deepseek - language-model
π€ Qwen1.5-DeepSeek-Merge: Uniting Precision & Efficiency
π Overview
Qwen1.5-DeepSeek-Merge is an experimental hybrid model merging the capabilities of Qwen/Qwen1.5-1.8B and DeepSeek-R1-Distill-Qwen-1.5B using the Linear Merge method via MergeKit. This fusion aims to capture the strengths of both modelsβbalancing linguistic nuance with distilled performance.
π Created by: [Matteo Khan]
π Affiliation: Apprentice at TW3 Partners (Generative AI Research)
π License: MIT
π Connect on LinkedIn
π Model on Hugging Face
π§ Model Details
- Model Type: Merged Language Model
- Parent Models:
- Merge Method: Linear Merge (via MergeKit)
- Precision: bfloat16
π― Intended Use
This model is primarily intended for research and experimentation in language model merging. Potential applications include:
- β General Text Generation
- β Dialogue Systems
- β Prompt Engineering Research
- β Evaluation of Merging Strategies
β οΈ Limitations & Considerations
While this model may exhibit improved behavior in some cases, it also inherits limitations from its parent models:
- β May generate hallucinated or unverified information
- β οΈ Susceptible to biases or offensive outputs
- π Merge effects may introduce unexpected behaviors
- π Task-specific performance not guaranteed
π¬ Merging Process & Configuration
This model is a merge, not a newly fine-tuned one. Below is the exact configuration used:
hf_repo_name = "MatteoKhan/Qwen1.5-DeepSeek-Merge"
config = {
"merge_method": "linear",
"dtype": torch.bfloat16,
"models": [
{
"model": "Qwen/Qwen1.5-1.8B",
"parameters": {
"weight": 0.5
}
},
{
"model": "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B",
"parameters": {
"weight": 0.5
}
}
],
"parameters": {
"normalize": True
},
"layers": [
{"pattern": "model."}
]
}
π No formal benchmark yetβcommunity testing is welcome!
π± Environmental Impact
By merging pre-trained models instead of training from scratch, this approach saves substantial compute and reduces carbon emissions.
π How to Use
python
Copier
Modifier
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "MatteoKhan/Qwen1.5-DeepSeek-Merge"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto")
prompt = "What are the implications of quantum computing on AI?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
π¬ Questions or feedback? Contact via Hugging Face or LinkedIn
- Downloads last month
- -