DUSFT-llm-model / README.md
leftfooted's picture
Update README.md
64fbae9 verified
metadata
license: mit

DUS Forty Layer Merged Model

Overview

The DUS Forty Layer Merged Model leverages a unique layer interlocking strategy, combining layers from the Llama-2-13B and Mistral-7B architectures. This approach optimizes computational efficiency while maintaining competitive performance across various natural language processing tasks.

Model Details

  • Architecture: Based on Llama-2-13B and Mistral-7B
  • Layer Arrangement: The forty configuration merges layers from both models, interlocking layers 0–20 with layers 12–32.
  • Tokenizer: Mistral-7B tokenizer is used for encoding and decoding.

Training Details