YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

license: mit language: - en base_model: - Qwen/Qwen1.5-1.8B - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B library_name: transformers tags: - mergekit - merged-model - qwen - deepseek - language-model

πŸ€– Qwen1.5-DeepSeek-Merge: Uniting Precision & Efficiency

πŸ“Œ Overview

Qwen1.5-DeepSeek-Merge is an experimental hybrid model merging the capabilities of Qwen/Qwen1.5-1.8B and DeepSeek-R1-Distill-Qwen-1.5B using the Linear Merge method via MergeKit. This fusion aims to capture the strengths of both modelsβ€”balancing linguistic nuance with distilled performance.

πŸ”— Created by: [Matteo Khan]
πŸŽ“ Affiliation: Apprentice at TW3 Partners (Generative AI Research)
πŸ“ License: MIT

πŸ”— Connect on LinkedIn
πŸ”— Model on Hugging Face

🧠 Model Details

🎯 Intended Use

This model is primarily intended for research and experimentation in language model merging. Potential applications include:

  • βœ… General Text Generation
  • βœ… Dialogue Systems
  • βœ… Prompt Engineering Research
  • βœ… Evaluation of Merging Strategies

⚠️ Limitations & Considerations

While this model may exhibit improved behavior in some cases, it also inherits limitations from its parent models:

  • ❌ May generate hallucinated or unverified information
  • ⚠️ Susceptible to biases or offensive outputs
  • πŸ”€ Merge effects may introduce unexpected behaviors
  • πŸ“‰ Task-specific performance not guaranteed

πŸ”¬ Merging Process & Configuration

This model is a merge, not a newly fine-tuned one. Below is the exact configuration used:

hf_repo_name = "MatteoKhan/Qwen1.5-DeepSeek-Merge"

config = {
    "merge_method": "linear",
    "dtype": torch.bfloat16,
    "models": [
        {
            "model": "Qwen/Qwen1.5-1.8B",
            "parameters": {
                "weight": 0.5
            }
        },
        {
            "model": "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B",
            "parameters": {
                "weight": 0.5
            }
        }
    ],
    "parameters": {
        "normalize": True
    },
    "layers": [
        {"pattern": "model."}
    ]
}
πŸ“Š No formal benchmark yetβ€”community testing is welcome!

🌱 Environmental Impact
By merging pre-trained models instead of training from scratch, this approach saves substantial compute and reduces carbon emissions.

πŸš€ How to Use
python
Copier
Modifier
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "MatteoKhan/Qwen1.5-DeepSeek-Merge"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto")

prompt = "What are the implications of quantum computing on AI?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

πŸ“¬ Questions or feedback? Contact via Hugging Face or LinkedIn
Downloads last month
-
Safetensors
Model size
2B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support