Text Generation
Transformers
English
code
Not-For-All-Audiences
File size: 1,612 Bytes
abdba7b
 
 
 
 
 
 
 
 
 
 
 
 
 
217c8e2
 
abdba7b
217c8e2
abdba7b
217c8e2
abdba7b
217c8e2
abdba7b
217c8e2
abdba7b
217c8e2
abdba7b
217c8e2
abdba7b
217c8e2
abdba7b
217c8e2
 
 
 
 
 
abdba7b
217c8e2
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
---
library_name: transformers
tags:
- code
- not-for-all-audiences
license: apache-2.0
datasets:
- Locutusque/hercules-v1.0
- Open-Orca/OpenOrca
language:
- en
base_model: mistralai/Mistral-7B-v0.1
---

# Hercules-1.0-Mistral-7B
![Hercules](https://th.bing.com/th/id/OIG2.jIgN3IQ2IHoWnM0A0uJs?w=270&h=270&c=6&r=0&o=5&pid=ImgGn)

## Model description

Hercules-1.0-Mistral-7B is a fine-tune of the Mistral 7B model.

Designed to be a turbo-charged version of teknium's OpenHermes through augmented data sources. Improvement over OpenHermes is currently unknown, more information coming soon.

Apart from *potentially* higher performance over OpenHermes, this model has data and training transparency for reproducibility.

You can learn more about the Hercules dataset here: ![Locutusque/hercules-v1.0](https://huggingface.co/datasets/Locutusque/hercules-v1.0)

During training, the dataset is split into a test set of 100 examples. At the end of training (120,000 examples), this model achieved a test loss of 0.57.

### Training details

- This model was trained on 8 kaggle TPUs, using torch xla SPMD for high MXU efficiency. There was no expense on my end (meaning you can reproduce this too!)
- A learning rate of 2e-06 with the Adam optimizer. No LR scheduler was used. A low learning rate was used to prevent exploding gradients.
- No mixed precision was used, with the default dtype being bfloat16.
- Trained on both full subsets of OpenOrca, and 120,000 examples of Hercules.
- No model parameters were frozen.
- This model was trained on OpenAI's ChatML prompt format.

## Inference examples
Coming soon