Create README.md
Browse files
README.md
CHANGED
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# DUS Forty Layer Merged Model
|
2 |
+
|
3 |
+
## Overview
|
4 |
+
The DUS Forty Layer Merged Model leverages a unique layer interlocking strategy, combining layers from the Llama-2-13B and Mistral-7B architectures. This approach optimizes computational efficiency while maintaining competitive performance across various natural language processing tasks.
|
5 |
+
|
6 |
+
## Model Details
|
7 |
+
- **Architecture**: Based on Llama-2-13B and Mistral-7B
|
8 |
+
- **Layer Arrangement**: The `forty` configuration merges layers from both models, interlocking layers 0–20 with layers 12–32.
|
9 |
+
- **Tokenizer**: Mistral-7B tokenizer is used for encoding and decoding.
|
10 |
+
|
11 |
+
## Training Details
|
12 |
+
- **Base Models**:
|
13 |
+
- Llama-2-13B: [meta-llama/Llama-2-13b-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf)
|
14 |
+
- Mistral-7B: [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
|