leftfooted commited on
Commit
042f459
·
verified ·
1 Parent(s): fd18cdb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -3
README.md CHANGED
@@ -1,3 +1,33 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - Open-Orca/OpenOrca
5
+ base_model:
6
+ - meta-llama/Llama-2-7b-hf
7
+ ---
8
+ # Solar-DUS
9
+
10
+ **Model Name:** solar-DUS
11
+ **Model Type:** Transformer-based model
12
+ **Architecture:** Based on Llama-2-7B architecture with the DUS method applied to match the 48-layer structure of solar-10.7B
13
+ **Training Data:** 5000 examples from the Openocra dataset
14
+ **Training Parameters:**
15
+ - **Batch Size:** 1
16
+ - **Epochs:** 3
17
+ - **Optimizer:** AdamW
18
+ - **Learning Rate:** 5e-5
19
+
20
+ ## Model Overview
21
+ The `solar-DUS` model is a transformer-based architecture built upon the Llama-2-7B model, utilizing the DUS (Dynamic Uncertainty Sampling) method to optimize performance. It features 48 layers, aiming to closely match the architecture of the upstage solar-10.7B model while leveraging DUS to improve generalization and training efficiency.
22
+
23
+ ## Model Performance
24
+ This model was trained on a subset of 5000 examples from the Openocra dataset, with the goal of testing whether the DUS method enhances performance compared to other configurations. Performance may vary depending on the specific use case, and further evaluation is recommended.
25
+
26
+ ## Intended Use
27
+ - Primarily intended for natural language processing (NLP) tasks, including but not limited to text generation, classification, and summarization.
28
+ - Suitable for fine-tuning with smaller datasets like Openocra, particularly when task-specific adjustments are necessary.
29
+
30
+ ## Limitations
31
+ - The model was trained on only 5000 examples from the Openocra dataset, which may limit its generalization ability.
32
+ - Further fine-tuning with larger datasets could improve performance for more complex tasks.
33
+ - The batch size was set to 1, which may impact efficiency and scalability when working with larger datasets.