File size: 1,874 Bytes
a550434
 
 
 
 
 
 
 
 
4ea9b74
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
---
license: apache-2.0
datasets:
- Open-Orca/OpenOrca
base_model:
- meta-llama/Llama-2-7b-hf
---
# llama-2 40 layer model

## Model Overview

LlaMa-DUSFT is a custom variant of the LLaMA-2-7B model created using the DUS (Dynamic Update Strategy) methodology. The original LLaMA-2-7B model consists of 32 layers, and this variant introduces a novel approach to optimize performance by reconfiguring and expanding the layer architecture to 40 layers.

### Key Modifications:

1. Layer Splitting:

  - The original 32 layers of LLaMA-2-7B were duplicated.

  - In one variant, the last 12 layers were removed.

  - In another variant, the first 12 layers were removed.

2. Layer Merging:

  - The two resulting 20-layer segments were combined to form a 40-layer model.

### Purpose:

This architectural modification was designed to test whether the DUS approach with an expanded layer count improves performance compared to the standard LLaMA-2 architecture.

## Training Details

### Dataset:

- The model was trained on a subset of the OpenOrca dataset, consisting of 5,000 samples.

### Training Configuration:

- Batch Size: 1

- Epochs: 3

- Optimizer: AdamW

- Learning Rate: 5e-5

- Software: Colab pro

### Preprocessing:

Data preprocessing followed the guidelines for LLaMA-2 models, ensuring tokenization and alignment were consistent with the original architecture.

## Results and Evaluation

### Performance Metrics:

- Due to the experimental nature of this model, specific evaluation metrics are currently limited.

- Initial results indicate improved adaptability in specific downstream tasks from the OpenOrca dataset.

### Observations:

- The DUS layer modification shows potential for enhancing model depth without significant degradation of performance.

- Further evaluation with larger datasets and varied tasks is required to confirm generalizability.