--- license: apache-2.0 tags: - merge - mergekit - lazymergekit - mobiuslabsgmbh/Llama-3.1-8b-instruct_4bitgs64_hqq_calib - VAGOsolutions/Llama-3.1-SauerkrautLM-8b-Instruct --- # Llama-3.1-8b-instruct_4bitgs64_hqq_calib-Llama-3.1-SauerkrautLM-8b-Instruct-ties-merge Llama-3.1-8b-instruct_4bitgs64_hqq_calib-Llama-3.1-SauerkrautLM-8b-Instruct-ties-merge is a sophisticated model resulting from the strategic merging of two powerful models: [mobiuslabsgmbh/Llama-3.1-8b-instruct_4bitgs64_hqq_calib](https://huggingface.co/mobiuslabsgmbh/Llama-3.1-8b-instruct_4bitgs64_hqq_calib) and [VAGOsolutions/Llama-3.1-SauerkrautLM-8b-Instruct](https://huggingface.co/VAGOsolutions/Llama-3.1-SauerkrautLM-8b-Instruct). This merge was accomplished using [mergekit](https://github.com/cg123/mergekit), a specialized tool that facilitates precise model blending to optimize performance and synergy between the merged architectures. ## 🧩 Merge Configuration ```yaml slices: - sources: - model: mobiuslabsgmbh/Llama-3.1-8b-instruct_4bitgs64_hqq_calib layer_range: [0, 31] - model: VAGOsolutions/Llama-3.1-SauerkrautLM-8b-Instruct layer_range: [0, 31] merge_method: ties base_model: mobiuslabsgmbh/Llama-3.1-8b-instruct_4bitgs64_hqq_calib parameters: t: - filter: self_attn value: [0, 0.5, 0.3, 0.7, 1] - filter: mlp value: [1, 0.5, 0.7, 0.3, 0] - value: 0.5 dtype: float16 ``` ## Model Features This merged model combines the advanced capabilities of the HQQ quantized version of Llama-3.1-8B-Instruct with the fine-tuned prowess of Llama-3.1-SauerkrautLM-8b-Instruct. The result is a versatile model that excels in both generative tasks and nuanced understanding of multilingual contexts, particularly in German and English. The integration of Spectrum Fine-Tuning from the Sauerkraut model enhances the model's efficiency and performance, making it suitable for a wide range of applications. ## Evaluation Results The evaluation results from the parent models indicate strong performance across various benchmarks. For instance, the HQQ 4-bit version of Llama-3.1-8B-Instruct achieved notable scores on tasks like ARC, HellaSwag, and MMLU, while the SauerkrautLM model demonstrated significant improvements in multilingual capabilities. The combined strengths of these models in the merged version are expected to yield even better results in practical applications. | Benchmark | Llama-3.1-8B-Instruct (HQQ 4-bit) | Llama-3.1-SauerkrautLM | |-------------------|-----------------------------------|-------------------------| | ARC (25-shot) | 60.32 | - | | HellaSwag (10-shot)| 79.21 | - | | MMLU (5-shot) | 67.07 | - | | Multilingual Tasks | - | Improved | ## Limitations While the merged model benefits from the strengths of both parent models, it may also inherit some limitations. For instance, the potential for uncensored content remains a concern, as noted in the SauerkrautLM documentation. Additionally, the model's performance may vary depending on the specific task and the languages involved, particularly in less represented languages or dialects. Users should be aware of these factors when deploying the model in real-world applications.