File size: 1,874 Bytes
a550434 4ea9b74 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
---
license: apache-2.0
datasets:
- Open-Orca/OpenOrca
base_model:
- meta-llama/Llama-2-7b-hf
---
# llama-2 40 layer model
## Model Overview
LlaMa-DUSFT is a custom variant of the LLaMA-2-7B model created using the DUS (Dynamic Update Strategy) methodology. The original LLaMA-2-7B model consists of 32 layers, and this variant introduces a novel approach to optimize performance by reconfiguring and expanding the layer architecture to 40 layers.
### Key Modifications:
1. Layer Splitting:
- The original 32 layers of LLaMA-2-7B were duplicated.
- In one variant, the last 12 layers were removed.
- In another variant, the first 12 layers were removed.
2. Layer Merging:
- The two resulting 20-layer segments were combined to form a 40-layer model.
### Purpose:
This architectural modification was designed to test whether the DUS approach with an expanded layer count improves performance compared to the standard LLaMA-2 architecture.
## Training Details
### Dataset:
- The model was trained on a subset of the OpenOrca dataset, consisting of 5,000 samples.
### Training Configuration:
- Batch Size: 1
- Epochs: 3
- Optimizer: AdamW
- Learning Rate: 5e-5
- Software: Colab pro
### Preprocessing:
Data preprocessing followed the guidelines for LLaMA-2 models, ensuring tokenization and alignment were consistent with the original architecture.
## Results and Evaluation
### Performance Metrics:
- Due to the experimental nature of this model, specific evaluation metrics are currently limited.
- Initial results indicate improved adaptability in specific downstream tasks from the OpenOrca dataset.
### Observations:
- The DUS layer modification shows potential for enhancing model depth without significant degradation of performance.
- Further evaluation with larger datasets and varied tasks is required to confirm generalizability. |