Update README.md
Browse files
README.md
CHANGED
@@ -120,11 +120,20 @@ For running the benchmark we used another awesome contribution from Maxime: [LLM
|
|
120 |
|
121 |
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average | dpo-pairs | % original pairs |
|
122 |
|-------------------------------------------------------------------------------------------------------------------|--------:|--------:|-----------:|---------:|--------:|----------:|-----------------:|
|
123 |
-
| [argilla/distilabeled-Hermes-2.5-Mistral-7B](https://huggingface.co/argilla/distilabeled-Hermes-2.5-Mistral-7B) | **44.64** | **73.35** | 55.96 | 42.21 | **54.04** | 5,922 | **46%** |
|
124 |
| [dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel](https://huggingface.co/dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel) (first experiment) | 44.27 | 73.3 | **56.26** | **42.25** | 54.02 | 7,732 | 60% |
|
125 |
| mlabonne/NeuralHermes-2.5-Mistral-7B (original recipe) | 43.67 | 73.24 | 55.37 | 41.76 | 53.51 | 12,859 | 100% |
|
126 |
| teknium/OpenHermes-2.5-Mistral-7B | 42.75 | 72.99 | 52.99 | 40.94 | 52.42| 0 (no DPO) | N/A |
|
127 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
128 |
### Training Hardware
|
129 |
|
130 |
We used 1 x A100 40GB in runpod for less than 1 hour.
|
|
|
120 |
|
121 |
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average | dpo-pairs | % original pairs |
|
122 |
|-------------------------------------------------------------------------------------------------------------------|--------:|--------:|-----------:|---------:|--------:|----------:|-----------------:|
|
123 |
+
| [argilla/distilabeled-Hermes-2.5-Mistral-7B](https://huggingface.co/argilla/distilabeled-Hermes-2.5-Mistral-7B) | **44.64** | **73.35** | 55.96 | 42.21 | **54.04** | **5,922** | **46%** |
|
124 |
| [dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel](https://huggingface.co/dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel) (first experiment) | 44.27 | 73.3 | **56.26** | **42.25** | 54.02 | 7,732 | 60% |
|
125 |
| mlabonne/NeuralHermes-2.5-Mistral-7B (original recipe) | 43.67 | 73.24 | 55.37 | 41.76 | 53.51 | 12,859 | 100% |
|
126 |
| teknium/OpenHermes-2.5-Mistral-7B | 42.75 | 72.99 | 52.99 | 40.94 | 52.42| 0 (no DPO) | N/A |
|
127 |
|
128 |
+
> Update: we now include llm-harness results too!
|
129 |
+
|
130 |
+
| Model | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K | dpo-pairs | % original pairs |
|
131 |
+
|------------------------------------------------------|-------|-----------|------|-----------:|------------|-------|----------:|-----------------:|
|
132 |
+
| [argilla/distilabeled-Hermes-2.5-Mistral-7B](https://huggingface.co/argilla/distilabeled-Hermes-2.5-Mistral-7B) | 66.04 | **85.07** | Pending | 55.96 | **79.56** | **66.34** | **5,922** | **46%** |
|
133 |
+
| [dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel](https://huggingface.co/dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel) | 65.36 | 84.74 | Pending | **56.26** | 79.24 | 65.13 | 7,732 | 60% |
|
134 |
+
| [mlabonne/NeuralHermes-2.5-Mistral-7B](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B) | **66.55** | 84.90 | **63.32** | 54.93 | 78.30 | 61.30 | 12,859 | 100% |
|
135 |
+
| [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) | 64.93 | 84.18 | 63.64 | 52.24 | 78.06 | 26.08 | 0 (no DPO) | N/A |
|
136 |
+
|
137 |
### Training Hardware
|
138 |
|
139 |
We used 1 x A100 40GB in runpod for less than 1 hour.
|