|
--- |
|
library_name: transformers |
|
tags: |
|
- code |
|
- not-for-all-audiences |
|
license: apache-2.0 |
|
datasets: |
|
- Locutusque/hercules-v1.0 |
|
- Open-Orca/OpenOrca |
|
language: |
|
- en |
|
base_model: mistralai/Mistral-7B-v0.1 |
|
--- |
|
|
|
# Hercules-1.0-Mistral-7B |
|
 |
|
|
|
## Model description |
|
|
|
Hercules-1.0-Mistral-7B is a fine-tune of the Mistral 7B model. |
|
|
|
Designed to be a turbo-charged version of teknium's OpenHermes through augmented data sources. Improvement over OpenHermes is currently unknown, more information coming soon. |
|
|
|
Apart from *potentially* higher performance over OpenHermes, this model has data and training transparency for reproducibility. |
|
|
|
You can learn more about the Hercules dataset here:  |
|
|
|
During training, the dataset is split into a test set of 100 examples. At the end of training (120,000 examples), this model achieved a test loss of 0.57. |
|
|
|
### Training details |
|
|
|
- This model was trained on 8 kaggle TPUs, using torch xla SPMD for high MXU efficiency. There was no expense on my end (meaning you can reproduce this too!) |
|
- A learning rate of 2e-06 with the Adam optimizer. No LR scheduler was used. A low learning rate was used to prevent exploding gradients. |
|
- No mixed precision was used, with the default dtype being bfloat16. |
|
- Trained on both full subsets of OpenOrca, and 120,000 examples of Hercules. |
|
- No model parameters were frozen. |
|
- This model was trained on OpenAI's ChatML prompt format. |
|
|
|
## Inference examples |
|
Coming soon |