leftfooted
/

solar-DUS

Model card Files Files and versions

leftfooted commited on Jan 16

Commit

042f459

·

verified ·

1 Parent(s): fd18cdb

Update README.md

Files changed (1) hide show

README.md +33 -3

README.md CHANGED Viewed

@@ -1,3 +1,33 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+datasets:
+- Open-Orca/OpenOrca
+base_model:
+- meta-llama/Llama-2-7b-hf
+---
+# Solar-DUS
+**Model Name:** solar-DUS
+**Model Type:** Transformer-based model
+**Architecture:** Based on Llama-2-7B architecture with the DUS method applied to match the 48-layer structure of solar-10.7B
+**Training Data:** 5000 examples from the Openocra dataset
+**Training Parameters:**
+- **Batch Size:** 1
+- **Epochs:** 3
+- **Optimizer:** AdamW
+- **Learning Rate:** 5e-5
+## Model Overview
+The `solar-DUS` model is a transformer-based architecture built upon the Llama-2-7B model, utilizing the DUS (Dynamic Uncertainty Sampling) method to optimize performance. It features 48 layers, aiming to closely match the architecture of the upstage solar-10.7B model while leveraging DUS to improve generalization and training efficiency.
+## Model Performance
+This model was trained on a subset of 5000 examples from the Openocra dataset, with the goal of testing whether the DUS method enhances performance compared to other configurations. Performance may vary depending on the specific use case, and further evaluation is recommended.
+## Intended Use
+- Primarily intended for natural language processing (NLP) tasks, including but not limited to text generation, classification, and summarization.
+- Suitable for fine-tuning with smaller datasets like Openocra, particularly when task-specific adjustments are necessary.
+## Limitations
+- The model was trained on only 5000 examples from the Openocra dataset, which may limit its generalization ability.
+- Further fine-tuning with larger datasets could improve performance for more complex tasks.
+- The batch size was set to 1, which may impact efficiency and scalability when working with larger datasets.