leftfooted commited on
Commit
4ea9b74
·
verified ·
1 Parent(s): a550434

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -1
README.md CHANGED
@@ -7,4 +7,60 @@ base_model:
7
  ---
8
  # llama-2 40 layer model
9
 
10
- 기존 32 layer를 갖는 llama-2를 DUS를 이용하여 40 layer 모델로 변환하고 OpenOrca 데이터 셋 일부를 학습
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  ---
8
  # llama-2 40 layer model
9
 
10
+ ## Model Overview
11
+
12
+ LlaMa-DUSFT is a custom variant of the LLaMA-2-7B model created using the DUS (Dynamic Update Strategy) methodology. The original LLaMA-2-7B model consists of 32 layers, and this variant introduces a novel approach to optimize performance by reconfiguring and expanding the layer architecture to 40 layers.
13
+
14
+ ### Key Modifications:
15
+
16
+ 1. Layer Splitting:
17
+
18
+ - The original 32 layers of LLaMA-2-7B were duplicated.
19
+
20
+ - In one variant, the last 12 layers were removed.
21
+
22
+ - In another variant, the first 12 layers were removed.
23
+
24
+ 2. Layer Merging:
25
+
26
+ - The two resulting 20-layer segments were combined to form a 40-layer model.
27
+
28
+ ### Purpose:
29
+
30
+ This architectural modification was designed to test whether the DUS approach with an expanded layer count improves performance compared to the standard LLaMA-2 architecture.
31
+
32
+ ## Training Details
33
+
34
+ ### Dataset:
35
+
36
+ - The model was trained on a subset of the OpenOrca dataset, consisting of 5,000 samples.
37
+
38
+ ### Training Configuration:
39
+
40
+ - Batch Size: 1
41
+
42
+ - Epochs: 3
43
+
44
+ - Optimizer: AdamW
45
+
46
+ - Learning Rate: 5e-5
47
+
48
+ - Software: Colab pro
49
+
50
+ ### Preprocessing:
51
+
52
+ Data preprocessing followed the guidelines for LLaMA-2 models, ensuring tokenization and alignment were consistent with the original architecture.
53
+
54
+ ## Results and Evaluation
55
+
56
+ ### Performance Metrics:
57
+
58
+ - Due to the experimental nature of this model, specific evaluation metrics are currently limited.
59
+
60
+ - Initial results indicate improved adaptability in specific downstream tasks from the OpenOrca dataset.
61
+
62
+ ### Observations:
63
+
64
+ - The DUS layer modification shows potential for enhancing model depth without significant degradation of performance.
65
+
66
+ - Further evaluation with larger datasets and varied tasks is required to confirm generalizability.