--- library_name: transformers tags: - code - not-for-all-audiences license: apache-2.0 datasets: - Locutusque/hercules-v1.0 - Open-Orca/OpenOrca language: - en base_model: mistralai/Mistral-7B-v0.1 --- # Hercules-1.0-Mistral-7B ![Hercules](https://th.bing.com/th/id/OIG2.jIgN3IQ2IHoWnM0A0uJs?w=270&h=270&c=6&r=0&o=5&pid=ImgGn) ## Model description Hercules-1.0-Mistral-7B is a fine-tune of the Mistral 7B model. Designed to be a turbo-charged version of teknium's OpenHermes through augmented data sources. Improvement over OpenHermes is currently unknown, more information coming soon. Apart from *potentially* higher performance over OpenHermes, this model has data and training transparency for reproducibility. You can learn more about the Hercules dataset here: ![Locutusque/hercules-v1.0](https://huggingface.co/datasets/Locutusque/hercules-v1.0) During training, the dataset is split into a test set of 100 examples. At the end of training (120,000 examples), this model achieved a test loss of 0.57. ### Training details - This model was trained on 8 kaggle TPUs, using torch xla SPMD for high MXU efficiency. There was no expense on my end (meaning you can reproduce this too!) - A learning rate of 2e-06 with the Adam optimizer. No LR scheduler was used. A low learning rate was used to prevent exploding gradients. - No mixed precision was used, with the default dtype being bfloat16. - Trained on both full subsets of OpenOrca, and 120,000 examples of Hercules. - No model parameters were frozen. - This model was trained on OpenAI's ChatML prompt format. ## Inference examples Coming soon