Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,7 @@
|
|
|
|
|
|
|
|
|
|
1 |
# keval-2-1b
|
2 |
keval-2-1b is an advanced evaluation model specifically designed to assess Korean language models using a LLM-as-a-judge approach. It is a departure from the traditional method which utilized chatgpt for evaluations. keval leverages the Gemma2-9b architecture, enhanced through SFT (Supervised Fine-Tuning) and DPO (Direct Policy Optimization). This model is trained on the newly developed Ko-bench dataset, inspired by MT-bench, tailored for Korean linguistic nuances.
|
3 |
|
@@ -161,5 +165,4 @@ The `score` column represents the ratio of correctly predicted labels to the tot
|
|
161 |
|---:|:-----------|:---------|:--------|---------:|:-----------|:----------|:-----------|----:|:-----------|:----------|:----------|:----------|:----------|:----------|:-----------|
|
162 |
| 0 | keval-2-9b | 0 (0.0%) | 50.0% | 22 | 1 (50.0%) | 1 (50.0%) | 2 (100.0%) | 0 | 2 (100.0%) | 0 | 0 | 1 (50.0%) | 1 (50.0%) | 1 (50.0%) | 2 (100.0%) |
|
163 |
| 1 | keval-2-3b | 0 (0.0%) | 45.5% | 22 | 2 (100.0%) | 1 (50.0%) | 0 | 0 | 2 (100.0%) | 1 (50.0%) | 0 | 1 (50.0%) | 1 (50.0%) | 0 | 2 (100.0%) |
|
164 |
-
| 2 | keval-2-1b | 0 (0.0%) | 36.4% | 22 | 0 | 1 (50.0%) | 2 (100.0%) | 0 | 1 (50.0%) | 0 | 1 (50.0%) | 0 | 0 | 1 (50.0%) | 2 (100.0%) |
|
165 |
-
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- ko
|
4 |
+
---
|
5 |
# keval-2-1b
|
6 |
keval-2-1b is an advanced evaluation model specifically designed to assess Korean language models using a LLM-as-a-judge approach. It is a departure from the traditional method which utilized chatgpt for evaluations. keval leverages the Gemma2-9b architecture, enhanced through SFT (Supervised Fine-Tuning) and DPO (Direct Policy Optimization). This model is trained on the newly developed Ko-bench dataset, inspired by MT-bench, tailored for Korean linguistic nuances.
|
7 |
|
|
|
165 |
|---:|:-----------|:---------|:--------|---------:|:-----------|:----------|:-----------|----:|:-----------|:----------|:----------|:----------|:----------|:----------|:-----------|
|
166 |
| 0 | keval-2-9b | 0 (0.0%) | 50.0% | 22 | 1 (50.0%) | 1 (50.0%) | 2 (100.0%) | 0 | 2 (100.0%) | 0 | 0 | 1 (50.0%) | 1 (50.0%) | 1 (50.0%) | 2 (100.0%) |
|
167 |
| 1 | keval-2-3b | 0 (0.0%) | 45.5% | 22 | 2 (100.0%) | 1 (50.0%) | 0 | 0 | 2 (100.0%) | 1 (50.0%) | 0 | 1 (50.0%) | 1 (50.0%) | 0 | 2 (100.0%) |
|
168 |
+
| 2 | keval-2-1b | 0 (0.0%) | 36.4% | 22 | 0 | 1 (50.0%) | 2 (100.0%) | 0 | 1 (50.0%) | 0 | 1 (50.0%) | 0 | 0 | 1 (50.0%) | 2 (100.0%) |
|
|