davidkim205
/

Ko-Llama-3-8B-Instruct

Text Generation

text-generation-inference

Model card Files Files and versions Community

davidkim205 commited on Jun 13, 2024

Commit

e5acd1c

·

verified ·

1 Parent(s): c4c0455

Update README.md

Files changed (1) hide show

README.md +11 -5

README.md CHANGED Viewed

@@ -136,7 +136,7 @@ Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.
 ## Benchmark
-##kollm_evaluation
 https://github.com/davidkim205/kollm_evaluation
 | task             |  acc |
@@ -156,18 +156,24 @@ https://github.com/davidkim205/kollm_evaluation
 ### Evaluation of KEval
 https://huggingface.co/davidkim205/keval-7b
 | keval                              | average | kullm | logickor | wandb |
 | ---------------------------------- | ------- | ----- | -------- | ----- |
-| openai/gpt-4                       | 6.71    | 4.57  | 8.19     | 7.38  |
-| openai/gpt-3.5-turbo               | 6.10    | 4.26  | 7.27     | 6.78  |
 | davidkim205/Ko-Llama-3-8B-Instruct | 5.59    | 4.24  | 6.46     | 6.06  |
 ### Evaluation of ChatGPT
 | chatgpt                            | average | kullm | logickor | wandb |
 | ---------------------------------- | ------- | ----- | -------- | ----- |
-| openai/gpt-4                       | 7.31    | 4.57  | 8.67     | 8.7   |
-| openai/gpt-3.5-turbo               | 6.68    | 4.26  | 8.16     | 7.61  |
 | davidkim205/Ko-Llama-3-8B-Instruct | 5.45    | 4.22  | 6.49     | 5.64  |

 ## Benchmark
+### kollm_evaluation
 https://github.com/davidkim205/kollm_evaluation
 | task             |  acc |
 ### Evaluation of KEval
+keval is an evaluation model that learned the prompt and dataset used in the benchmark for evaluating Korean language models among various methods of evaluating models with chatgpt to compensate for the shortcomings of the existing lm-evaluation-harness.
 https://huggingface.co/davidkim205/keval-7b
 | keval                              | average | kullm | logickor | wandb |
 | ---------------------------------- | ------- | ----- | -------- | ----- |
+| openai/gpt-4                       | 6.79    | 4.66  | 8.51     | 7.21  |
+| openai/gpt-3.5-turbo               | 6.25    | 4.48  | 7.29     | 6.99  |
 | davidkim205/Ko-Llama-3-8B-Instruct | 5.59    | 4.24  | 6.46     | 6.06  |
 ### Evaluation of ChatGPT
 | chatgpt                            | average | kullm | logickor | wandb |
 | ---------------------------------- | ------- | ----- | -------- | ----- |
+| openai/gpt-4                       | 7.30    | 4.57  | 8.76     | 8.57  |
+| openai/gpt-3.5-turbo               | 6.53    | 4.26  | 7.5      | 7.82  |
 | davidkim205/Ko-Llama-3-8B-Instruct | 5.45    | 4.22  | 6.49     | 5.64  |