bartowski
/

Hercules-1.0-Mistral-7B-exl2

Text Generation

Not-For-All-Audiences

Model card Files Files and versions

Hercules-1.0-Mistral-7B-exl2 / README.md

bartowski's picture

Quant for 5.0

217c8e2 verified over 1 year ago

|

1.61 kB

	---
	library_name: transformers
	tags:
	- code
	- not-for-all-audiences
	license: apache-2.0
	datasets:
	- Locutusque/hercules-v1.0
	- Open-Orca/OpenOrca
	language:
	- en
	base_model: mistralai/Mistral-7B-v0.1
	---

	# Hercules-1.0-Mistral-7B
	![Hercules](https://th.bing.com/th/id/OIG2.jIgN3IQ2IHoWnM0A0uJs?w=270&h=270&c=6&r=0&o=5&pid=ImgGn)

	## Model description

	Hercules-1.0-Mistral-7B is a fine-tune of the Mistral 7B model.

	Designed to be a turbo-charged version of teknium's OpenHermes through augmented data sources. Improvement over OpenHermes is currently unknown, more information coming soon.

	Apart from potentially higher performance over OpenHermes, this model has data and training transparency for reproducibility.

	You can learn more about the Hercules dataset here: ![Locutusque/hercules-v1.0](https://huggingface.co/datasets/Locutusque/hercules-v1.0)

	During training, the dataset is split into a test set of 100 examples. At the end of training (120,000 examples), this model achieved a test loss of 0.57.

	### Training details

	- This model was trained on 8 kaggle TPUs, using torch xla SPMD for high MXU efficiency. There was no expense on my end (meaning you can reproduce this too!)
	- A learning rate of 2e-06 with the Adam optimizer. No LR scheduler was used. A low learning rate was used to prevent exploding gradients.
	- No mixed precision was used, with the default dtype being bfloat16.
	- Trained on both full subsets of OpenOrca, and 120,000 examples of Hercules.
	- No model parameters were frozen.
	- This model was trained on OpenAI's ChatML prompt format.

	## Inference examples
	Coming soon