YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

license: mit language: - en base_model: - Qwen/Qwen1.5-1.8B - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B library_name: transformers tags: - mergekit - merged-model - qwen - deepseek - language-model

🤖 Qwen1.5-DeepSeek-Merge: Uniting Precision & Efficiency

📌 Overview

Qwen1.5-DeepSeek-Merge is an experimental hybrid model merging the capabilities of Qwen/Qwen1.5-1.8B and DeepSeek-R1-Distill-Qwen-1.5B using the Linear Merge method via MergeKit. This fusion aims to capture the strengths of both models—balancing linguistic nuance with distilled performance.

🔗 Created by: [Matteo Khan]
🎓 Affiliation: Apprentice at TW3 Partners (Generative AI Research)
📍 License: MIT

🔗 Connect on LinkedIn
🔗 Model on Hugging Face

🧠 Model Details

Model Type: Merged Language Model
Parent Models:
- Qwen/Qwen1.5-1.8B
- DeepSeek-R1-Distill-Qwen-1.5B
Merge Method: Linear Merge (via MergeKit)
Precision: bfloat16

🎯 Intended Use

This model is primarily intended for research and experimentation in language model merging. Potential applications include:

✅ General Text Generation
✅ Dialogue Systems
✅ Prompt Engineering Research
✅ Evaluation of Merging Strategies

⚠️ Limitations & Considerations

While this model may exhibit improved behavior in some cases, it also inherits limitations from its parent models:

❌ May generate hallucinated or unverified information
⚠️ Susceptible to biases or offensive outputs
🔀 Merge effects may introduce unexpected behaviors
📉 Task-specific performance not guaranteed

🔬 Merging Process & Configuration

This model is a merge, not a newly fine-tuned one. Below is the exact configuration used:

hf_repo_name = "MatteoKhan/Qwen1.5-DeepSeek-Merge"

config = {
    "merge_method": "linear",
    "dtype": torch.bfloat16,
    "models": [
        {
            "model": "Qwen/Qwen1.5-1.8B",
            "parameters": {
                "weight": 0.5
            }
        },
        {
            "model": "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B",
            "parameters": {
                "weight": 0.5
            }
        }
    ],
    "parameters": {
        "normalize": True
    },
    "layers": [
        {"pattern": "model."}
    ]
}
📊 No formal benchmark yet—community testing is welcome!

🌱 Environmental Impact
By merging pre-trained models instead of training from scratch, this approach saves substantial compute and reduces carbon emissions.

🚀 How to Use
python
Copier
Modifier
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "MatteoKhan/Qwen1.5-DeepSeek-Merge"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto")

prompt = "What are the implications of quantum computing on AI?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

📬 Questions or feedback? Contact via Hugging Face or LinkedIn

Downloads last month: -

Safetensors

Model size

2B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support