--- license: apache-2.0 tags: - text-generation - llama.cpp - gguf - quantization - merged-model language: - en library_name: gguf --- # merged-Gensyn-Qwen2.5-1.5B-Instruct-deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B - GGUF Quantized Model This is a collection of GGUF quantized versions of [pravdin/merged-Gensyn-Qwen2.5-1.5B-Instruct-deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/pravdin/merged-Gensyn-Qwen2.5-1.5B-Instruct-deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B). ## 🌳 Model Tree This model was created by merging the following models: ``` pravdin/merged-Gensyn-Qwen2.5-1.5B-Instruct-deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B ├── Merge Method: dare_ties ├── Gensyn/Qwen2.5-1.5B-Instruct └── deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B ├── density: 0.6 ├── weight: 0.5 ``` **Merge Method**: DARE_TIES - Advanced merging technique that reduces interference between models ## 📊 Available Quantization Formats This repository contains multiple quantization formats optimized for different use cases: - **q4_k_m**: 4-bit quantization, medium quality, good balance of size and performance - **q5_k_m**: 5-bit quantization, higher quality, slightly larger size - **q8_0**: 8-bit quantization, highest quality, larger size but minimal quality loss ## 🚀 Usage ### With llama.cpp ```bash # Download a specific quantization wget https://huggingface.co/pravdin/merged-Gensyn-Qwen2.5-1.5B-Instruct-deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/resolve/main/merged-Gensyn-Qwen2.5-1.5B-Instruct-deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B.q4_k_m.gguf # Run with llama.cpp ./main -m merged-Gensyn-Qwen2.5-1.5B-Instruct-deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B.q4_k_m.gguf -p "Your prompt here" ``` ### With Python (llama-cpp-python) ```python from llama_cpp import Llama # Load the model llm = Llama(model_path="merged-Gensyn-Qwen2.5-1.5B-Instruct-deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B.q4_k_m.gguf") # Generate text output = llm("Your prompt here", max_tokens=512) print(output['choices'][0]['text']) ``` ### With Ollama ```bash # Create a Modelfile echo 'FROM ./merged-Gensyn-Qwen2.5-1.5B-Instruct-deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B.q4_k_m.gguf' > Modelfile # Create and run the model ollama create merged-Gensyn-Qwen2.5-1.5B-Instruct-deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B -f Modelfile ollama run merged-Gensyn-Qwen2.5-1.5B-Instruct-deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B "Your prompt here" ``` ## 📋 Model Details - **Original Model**: [pravdin/merged-Gensyn-Qwen2.5-1.5B-Instruct-deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/pravdin/merged-Gensyn-Qwen2.5-1.5B-Instruct-deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B) - **Quantization Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp) - **License**: Same as original model - **Use Cases**: Optimized for local inference, edge deployment, and resource-constrained environments ## 🎯 Recommended Usage - **q4_k_m**: Best for most use cases, good quality/size trade-off - **q5_k_m**: When you need higher quality and have more storage/memory - **q8_0**: When you want minimal quality loss from the original model ## ⚡ Performance Notes GGUF models are optimized for: - Faster loading times - Lower memory usage - CPU and GPU inference - Cross-platform compatibility For best performance, ensure your hardware supports the quantization format you choose. --- *This model was automatically quantized using the Lemuru LLM toolkit.*