davidkim205 commited on
Commit
e5acd1c
·
verified ·
1 Parent(s): c4c0455

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -5
README.md CHANGED
@@ -136,7 +136,7 @@ Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.
136
 
137
  ## Benchmark
138
 
139
- ##kollm_evaluation
140
  https://github.com/davidkim205/kollm_evaluation
141
 
142
  | task | acc |
@@ -156,18 +156,24 @@ https://github.com/davidkim205/kollm_evaluation
156
 
157
 
158
 
 
159
  ### Evaluation of KEval
 
 
160
  https://huggingface.co/davidkim205/keval-7b
161
 
 
162
  | keval | average | kullm | logickor | wandb |
163
  | ---------------------------------- | ------- | ----- | -------- | ----- |
164
- | openai/gpt-4 | 6.71 | 4.57 | 8.19 | 7.38 |
165
- | openai/gpt-3.5-turbo | 6.10 | 4.26 | 7.27 | 6.78 |
166
  | davidkim205/Ko-Llama-3-8B-Instruct | 5.59 | 4.24 | 6.46 | 6.06 |
 
167
 
168
  ### Evaluation of ChatGPT
 
169
  | chatgpt | average | kullm | logickor | wandb |
170
  | ---------------------------------- | ------- | ----- | -------- | ----- |
171
- | openai/gpt-4 | 7.31 | 4.57 | 8.67 | 8.7 |
172
- | openai/gpt-3.5-turbo | 6.68 | 4.26 | 8.16 | 7.61 |
173
  | davidkim205/Ko-Llama-3-8B-Instruct | 5.45 | 4.22 | 6.49 | 5.64 |
 
136
 
137
  ## Benchmark
138
 
139
+ ### kollm_evaluation
140
  https://github.com/davidkim205/kollm_evaluation
141
 
142
  | task | acc |
 
156
 
157
 
158
 
159
+
160
  ### Evaluation of KEval
161
+ keval is an evaluation model that learned the prompt and dataset used in the benchmark for evaluating Korean language models among various methods of evaluating models with chatgpt to compensate for the shortcomings of the existing lm-evaluation-harness.
162
+
163
  https://huggingface.co/davidkim205/keval-7b
164
 
165
+
166
  | keval | average | kullm | logickor | wandb |
167
  | ---------------------------------- | ------- | ----- | -------- | ----- |
168
+ | openai/gpt-4 | 6.79 | 4.66 | 8.51 | 7.21 |
169
+ | openai/gpt-3.5-turbo | 6.25 | 4.48 | 7.29 | 6.99 |
170
  | davidkim205/Ko-Llama-3-8B-Instruct | 5.59 | 4.24 | 6.46 | 6.06 |
171
+
172
 
173
  ### Evaluation of ChatGPT
174
+
175
  | chatgpt | average | kullm | logickor | wandb |
176
  | ---------------------------------- | ------- | ----- | -------- | ----- |
177
+ | openai/gpt-4 | 7.30 | 4.57 | 8.76 | 8.57 |
178
+ | openai/gpt-3.5-turbo | 6.53 | 4.26 | 7.5 | 7.82 |
179
  | davidkim205/Ko-Llama-3-8B-Instruct | 5.45 | 4.22 | 6.49 | 5.64 |