Text Generation
Transformers
English
code
Not-For-All-Audiences
bartowski's picture
Quant for 4.25
de0cdc1 verified
|
raw
history blame
1.61 kB
---
library_name: transformers
tags:
- code
- not-for-all-audiences
license: apache-2.0
datasets:
- Locutusque/hercules-v1.0
- Open-Orca/OpenOrca
language:
- en
base_model: mistralai/Mistral-7B-v0.1
---
# Hercules-1.0-Mistral-7B
![Hercules](https://th.bing.com/th/id/OIG2.jIgN3IQ2IHoWnM0A0uJs?w=270&h=270&c=6&r=0&o=5&pid=ImgGn)
## Model description
Hercules-1.0-Mistral-7B is a fine-tune of the Mistral 7B model.
Designed to be a turbo-charged version of teknium's OpenHermes through augmented data sources. Improvement over OpenHermes is currently unknown, more information coming soon.
Apart from *potentially* higher performance over OpenHermes, this model has data and training transparency for reproducibility.
You can learn more about the Hercules dataset here: ![Locutusque/hercules-v1.0](https://huggingface.co/datasets/Locutusque/hercules-v1.0)
During training, the dataset is split into a test set of 100 examples. At the end of training (120,000 examples), this model achieved a test loss of 0.57.
### Training details
- This model was trained on 8 kaggle TPUs, using torch xla SPMD for high MXU efficiency. There was no expense on my end (meaning you can reproduce this too!)
- A learning rate of 2e-06 with the Adam optimizer. No LR scheduler was used. A low learning rate was used to prevent exploding gradients.
- No mixed precision was used, with the default dtype being bfloat16.
- Trained on both full subsets of OpenOrca, and 120,000 examples of Hercules.
- No model parameters were frozen.
- This model was trained on OpenAI's ChatML prompt format.
## Inference examples
Coming soon