DUS Forty Layer Merged Model

Overview

The DUS Forty Layer Merged Model leverages a unique layer interlocking strategy, combining layers from the Llama-2-13B and Mistral-7B architectures. This approach optimizes computational efficiency while maintaining competitive performance across various natural language processing tasks.

Model Details

Architecture: Based on Llama-2-13B and Mistral-7B
Layer Arrangement: The forty configuration merges layers from both models, interlocking layers 0–20 with layers 12–32.
Tokenizer: Mistral-7B tokenizer is used for encoding and decoding.

Training Details

Base Models:
- Llama-2-13B: meta-llama/Llama-2-13b-hf
- Mistral-7B: mistralai/Mistral-7B-v0.1