File size: 1,612 Bytes
abdba7b 217c8e2 abdba7b 217c8e2 abdba7b 217c8e2 abdba7b 217c8e2 abdba7b 217c8e2 abdba7b 217c8e2 abdba7b 217c8e2 abdba7b 217c8e2 abdba7b 217c8e2 abdba7b 217c8e2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
---
library_name: transformers
tags:
- code
- not-for-all-audiences
license: apache-2.0
datasets:
- Locutusque/hercules-v1.0
- Open-Orca/OpenOrca
language:
- en
base_model: mistralai/Mistral-7B-v0.1
---
# Hercules-1.0-Mistral-7B

## Model description
Hercules-1.0-Mistral-7B is a fine-tune of the Mistral 7B model.
Designed to be a turbo-charged version of teknium's OpenHermes through augmented data sources. Improvement over OpenHermes is currently unknown, more information coming soon.
Apart from *potentially* higher performance over OpenHermes, this model has data and training transparency for reproducibility.
You can learn more about the Hercules dataset here: 
During training, the dataset is split into a test set of 100 examples. At the end of training (120,000 examples), this model achieved a test loss of 0.57.
### Training details
- This model was trained on 8 kaggle TPUs, using torch xla SPMD for high MXU efficiency. There was no expense on my end (meaning you can reproduce this too!)
- A learning rate of 2e-06 with the Adam optimizer. No LR scheduler was used. A low learning rate was used to prevent exploding gradients.
- No mixed precision was used, with the default dtype being bfloat16.
- Trained on both full subsets of OpenOrca, and 120,000 examples of Hercules.
- No model parameters were frozen.
- This model was trained on OpenAI's ChatML prompt format.
## Inference examples
Coming soon |