Update README.md
Browse files
README.md
CHANGED
@@ -89,7 +89,7 @@ Bitte erkläre mir, wie die Zusammenführung von Modellen durch bestehende Spitz
|
|
89 |
## Evaluation
|
90 |
|
91 |
### GPT4ALL:
|
92 |
-
*Compared to
|
93 |

|
94 |
|
95 |

|
@@ -104,10 +104,10 @@ Bitte erkläre mir, wie die Zusammenführung von Modellen durch bestehende Spitz
|
|
104 |
**performed with newest Language Model Evaluation Harness*
|
105 |
|
106 |
### MMLU:
|
107 |
-
*Compared to Grok0,Grok1,GPT3.5,GPT4*
|
108 |

|
109 |
### TruthfulQA:
|
110 |
-
*Compared to GPT3.5,GPT4*
|
111 |

|
112 |
|
113 |
### MT-Bench (German):
|
@@ -170,6 +170,7 @@ SauerkrautLM-3b-v1 2.581250
|
|
170 |
open_llama_3b_v2 1.456250
|
171 |
Llama-2-7b 1.181250
|
172 |
```
|
|
|
173 |
### MT-Bench (English):
|
174 |

|
175 |
```
|
@@ -197,7 +198,7 @@ SauerkrautLM-7b-HerO <--- 7.409375
|
|
197 |
Mistral-7B-OpenOrca 6.915625
|
198 |
neural-chat-7b-v3-1 6.812500
|
199 |
```
|
200 |
-
|
201 |
|
202 |
### Additional German Benchmark results:
|
203 |

|
|
|
89 |
## Evaluation
|
90 |
|
91 |
### GPT4ALL:
|
92 |
+
*Compared to relevant German Closed and Open Source models*
|
93 |

|
94 |
|
95 |

|
|
|
104 |
**performed with newest Language Model Evaluation Harness*
|
105 |
|
106 |
### MMLU:
|
107 |
+
*Compared to Big Boy LLMs (Grok0,Grok1,GPT3.5,GPT4)*
|
108 |

|
109 |
### TruthfulQA:
|
110 |
+
*Compared to OpenAI Models (GPT3.5,GPT4)*
|
111 |

|
112 |
|
113 |
### MT-Bench (German):
|
|
|
170 |
open_llama_3b_v2 1.456250
|
171 |
Llama-2-7b 1.181250
|
172 |
```
|
173 |
+
**performed with the newest FastChat Version*
|
174 |
### MT-Bench (English):
|
175 |

|
176 |
```
|
|
|
198 |
Mistral-7B-OpenOrca 6.915625
|
199 |
neural-chat-7b-v3-1 6.812500
|
200 |
```
|
201 |
+
**performed with the newest FastChat Version*
|
202 |
|
203 |
### Additional German Benchmark results:
|
204 |

|