CodeMistral-Instruct-7B-AdvancedSlerp-v1

CodeMistral-Instruct-7B-AdvancedSlerp-v1 is a merge of the following models using LazyMergekit:

🧩 Configuration


slices:
  - sources:
      - model: OpenPipe/mistral-ft-optimized-1218
        layer_range: [0, 32]
      - model: mlabonne/NeuralHermes-2.5-Mistral-7B
        layer_range: [0, 32]
merge_method: slerp
base_model: OpenPipe/mistral-ft-optimized-1218
parameters:
  t:
    - filter: self_attn
      value: [0, 0.3, 0.5, 0.7, 0.5, 0.3, 1]  # Enhanced: Smoother wave for balanced attention fusion, emphasizing Hermes in mid-layers for reasoning boost
    - filter: mlp
      value: [1, 0.7, 0.5, 0.3, 0.5, 0.7, 0]  # Enhanced: Mirrored wave for MLP, starting strong on Hermes then balancing back
    - value: 0.5  # Default remains for other params
  normalize: true  # Add normalization for stable weights, improving model strength and reducing merge artifacts
  density:
    - value: 0.6  # Slightly higher density to retain more of the merged structure
    - filter: self_attn
      value: 0.7  # Bias toward preserving attention details for advanced capabilities
    - filter: mlp
      value: 0.5  # Balanced for MLP to maintain efficiency
  randomize: 0.05  # Small randomization for exploratory strength, can lead to innovative fusions
dtype: bfloat16

Background

This merge is an enhanced SLERP version of the DeepKarkhanis/NeuralPipe-7B-slerp repository. Below is the step-by-step analysis followed by the enhanced YAML configuration. The enhancements are based on best practices for SLERP in mergekit (e.g., refining interpolation parameters for smoother fusion, adding stability options, and ensuring optimal blending to combine the efficiency of the base model with the reasoning strengths of the second model).

Step-by-Step Analysis of the Original Configuration: Models Involved (sources):

Base model: OpenPipe/mistral-ft-optimized-1218 (a fine-tuned Mistral-7B optimized for performance and efficiency). Second model: mlabonne/NeuralHermes-2.5-Mistral-7B (a Hermes variant focused on advanced reasoning, instruction-following, and natural language capabilities). Layer range: [0, 32] for both, which is correct for Mistral-7B architecture (32 transformer layers). This allows full-model merging.
Merge Method (slerp):

SLERP (Spherical Linear Interpolation) is used, which is effective for merging similar architectures by interpolating weights on a sphere, preserving norms and leading to more stable fusions compared to linear merges. Base_model is set to the first source, meaning interpolation starts from it toward the second model.

Parameters (t):

t controls the interpolation strength (0 = fully base model, 1 = fully second model). Filtered for self_attn (attention layers): [0, 0.5, 0.3, 0.7, 1] – This creates a non-linear blend, starting conservative, dipping, then ramping up. Filtered for mlp (feed-forward layers): [1, 0.5, 0.7, 0.3, 0] – Inverse pattern, starting heavy on second model, then varying. Default: 0.5 (equal blend for other parameters). This setup allows layer-specific customization, which is good for targeting strengths (e.g., attention for reasoning, MLP for computation).

dtype: bfloat16

Uses brain floating-point 16 for efficiency and precision during merging, suitable for modern hardware.

Enhanced YAML Configuration:

Enhanced config by:

Expanding t lists to 7 values for smoother, more adaptive interpolation across layers (e.g., sinusoidal patterns to gradually emphasize strengths). Adding a normalize parameter (true) to stabilize weight norms post-merge, reducing artifacts. Introducing density parameters to control how much of each model's "mass" is retained, biased toward the stronger reasoning model. Adding a small randomization factor for exploratory merging, which can lead to unexpectedly strong variants in practice.

GGUF quantized version

Q4-K-M GGUF version is provided here: Q4-K-M

💻 Usage

!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "powermove72/CodeMistral-Instruct-7B-AdvancedSlerp-v1"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])