Update README.md
Browse files
README.md
CHANGED
@@ -136,7 +136,7 @@ Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.
|
|
136 |
|
137 |
## Benchmark
|
138 |
|
139 |
-
|
140 |
https://github.com/davidkim205/kollm_evaluation
|
141 |
|
142 |
| task | acc |
|
@@ -156,18 +156,24 @@ https://github.com/davidkim205/kollm_evaluation
|
|
156 |
|
157 |
|
158 |
|
|
|
159 |
### Evaluation of KEval
|
|
|
|
|
160 |
https://huggingface.co/davidkim205/keval-7b
|
161 |
|
|
|
162 |
| keval | average | kullm | logickor | wandb |
|
163 |
| ---------------------------------- | ------- | ----- | -------- | ----- |
|
164 |
-
| openai/gpt-4 | 6.
|
165 |
-
| openai/gpt-3.5-turbo | 6.
|
166 |
| davidkim205/Ko-Llama-3-8B-Instruct | 5.59 | 4.24 | 6.46 | 6.06 |
|
|
|
167 |
|
168 |
### Evaluation of ChatGPT
|
|
|
169 |
| chatgpt | average | kullm | logickor | wandb |
|
170 |
| ---------------------------------- | ------- | ----- | -------- | ----- |
|
171 |
-
| openai/gpt-4 | 7.
|
172 |
-
| openai/gpt-3.5-turbo | 6.
|
173 |
| davidkim205/Ko-Llama-3-8B-Instruct | 5.45 | 4.22 | 6.49 | 5.64 |
|
|
|
136 |
|
137 |
## Benchmark
|
138 |
|
139 |
+
### kollm_evaluation
|
140 |
https://github.com/davidkim205/kollm_evaluation
|
141 |
|
142 |
| task | acc |
|
|
|
156 |
|
157 |
|
158 |
|
159 |
+
|
160 |
### Evaluation of KEval
|
161 |
+
keval is an evaluation model that learned the prompt and dataset used in the benchmark for evaluating Korean language models among various methods of evaluating models with chatgpt to compensate for the shortcomings of the existing lm-evaluation-harness.
|
162 |
+
|
163 |
https://huggingface.co/davidkim205/keval-7b
|
164 |
|
165 |
+
|
166 |
| keval | average | kullm | logickor | wandb |
|
167 |
| ---------------------------------- | ------- | ----- | -------- | ----- |
|
168 |
+
| openai/gpt-4 | 6.79 | 4.66 | 8.51 | 7.21 |
|
169 |
+
| openai/gpt-3.5-turbo | 6.25 | 4.48 | 7.29 | 6.99 |
|
170 |
| davidkim205/Ko-Llama-3-8B-Instruct | 5.59 | 4.24 | 6.46 | 6.06 |
|
171 |
+
|
172 |
|
173 |
### Evaluation of ChatGPT
|
174 |
+
|
175 |
| chatgpt | average | kullm | logickor | wandb |
|
176 |
| ---------------------------------- | ------- | ----- | -------- | ----- |
|
177 |
+
| openai/gpt-4 | 7.30 | 4.57 | 8.76 | 8.57 |
|
178 |
+
| openai/gpt-3.5-turbo | 6.53 | 4.26 | 7.5 | 7.82 |
|
179 |
| davidkim205/Ko-Llama-3-8B-Instruct | 5.45 | 4.22 | 6.49 | 5.64 |
|