Update README.md
Browse files
README.md
CHANGED
@@ -7,4 +7,60 @@ base_model:
|
|
7 |
---
|
8 |
# llama-2 40 layer model
|
9 |
|
10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
---
|
8 |
# llama-2 40 layer model
|
9 |
|
10 |
+
## Model Overview
|
11 |
+
|
12 |
+
LlaMa-DUSFT is a custom variant of the LLaMA-2-7B model created using the DUS (Dynamic Update Strategy) methodology. The original LLaMA-2-7B model consists of 32 layers, and this variant introduces a novel approach to optimize performance by reconfiguring and expanding the layer architecture to 40 layers.
|
13 |
+
|
14 |
+
### Key Modifications:
|
15 |
+
|
16 |
+
1. Layer Splitting:
|
17 |
+
|
18 |
+
- The original 32 layers of LLaMA-2-7B were duplicated.
|
19 |
+
|
20 |
+
- In one variant, the last 12 layers were removed.
|
21 |
+
|
22 |
+
- In another variant, the first 12 layers were removed.
|
23 |
+
|
24 |
+
2. Layer Merging:
|
25 |
+
|
26 |
+
- The two resulting 20-layer segments were combined to form a 40-layer model.
|
27 |
+
|
28 |
+
### Purpose:
|
29 |
+
|
30 |
+
This architectural modification was designed to test whether the DUS approach with an expanded layer count improves performance compared to the standard LLaMA-2 architecture.
|
31 |
+
|
32 |
+
## Training Details
|
33 |
+
|
34 |
+
### Dataset:
|
35 |
+
|
36 |
+
- The model was trained on a subset of the OpenOrca dataset, consisting of 5,000 samples.
|
37 |
+
|
38 |
+
### Training Configuration:
|
39 |
+
|
40 |
+
- Batch Size: 1
|
41 |
+
|
42 |
+
- Epochs: 3
|
43 |
+
|
44 |
+
- Optimizer: AdamW
|
45 |
+
|
46 |
+
- Learning Rate: 5e-5
|
47 |
+
|
48 |
+
- Software: Colab pro
|
49 |
+
|
50 |
+
### Preprocessing:
|
51 |
+
|
52 |
+
Data preprocessing followed the guidelines for LLaMA-2 models, ensuring tokenization and alignment were consistent with the original architecture.
|
53 |
+
|
54 |
+
## Results and Evaluation
|
55 |
+
|
56 |
+
### Performance Metrics:
|
57 |
+
|
58 |
+
- Due to the experimental nature of this model, specific evaluation metrics are currently limited.
|
59 |
+
|
60 |
+
- Initial results indicate improved adaptability in specific downstream tasks from the OpenOrca dataset.
|
61 |
+
|
62 |
+
### Observations:
|
63 |
+
|
64 |
+
- The DUS layer modification shows potential for enhancing model depth without significant degradation of performance.
|
65 |
+
|
66 |
+
- Further evaluation with larger datasets and varied tasks is required to confirm generalizability.
|