Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,65 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
datasets:
|
4 |
+
- Junhoee/Jeju-Standard-Translation
|
5 |
+
language:
|
6 |
+
- ko
|
7 |
+
metrics:
|
8 |
+
- sacrebleu
|
9 |
+
- chrf
|
10 |
+
- bertscore
|
11 |
+
base_model:
|
12 |
+
- gogamza/kobart-base-v2
|
13 |
+
tags:
|
14 |
+
- nlp
|
15 |
+
- translation
|
16 |
+
- seq2seq
|
17 |
+
- low-resource-language
|
18 |
+
- korean-dialect
|
19 |
+
- jeju-dialect
|
20 |
+
- kobart
|
21 |
+
---
|
22 |
+
# ์ ์ฃผ ์ฌํ ๋ฃจ (Jeju Satoru)
|
23 |
+
|
24 |
+
## ํ๋ก์ ํธ ๊ฐ์
|
25 |
+
'์ ์ฃผ ์ฌํ ๋ฃจ'๋ ์ ๋ค์ค์ฝ์์ **'์๋ฉธ ์๊ธฐ ์ธ์ด'**๋ก ์ง์ ํ ์ ์ฃผ์ด์ ๋ณด์กด์ ๋ชฉํ๋ก ๊ฐ๋ฐ๋ **์ ์ฃผ์ด-ํ์ค์ด ์๋ฐฉํฅ ๋ฒ์ญ ๋ชจ๋ธ**์
๋๋ค. ์ด ๋ชจ๋ธ์ ์ ์ฃผ์ด ํ์์ ๋์งํธ ์ ๊ทผ์ฑ์ ๋์ฌ ๋์งํธ ์์ธ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๋ ๋ฐ ๊ธฐ์ฌํ๊ณ ์ ํฉ๋๋ค.
|
26 |
+
|
27 |
+
## ๋ชจ๋ธ ์ ๋ณด
|
28 |
+
- **๊ธฐ๋ฐ ๋ชจ๋ธ**: `gogamza/kobart-base-v2`
|
29 |
+
- **๋ชจ๋ธ ์ํคํ
์ฒ**: Seq2Seq (์ธ์ฝ๋-๋์ฝ๋ ๊ตฌ์กฐ)
|
30 |
+
- **ํ์ต ๋ฐ์ดํฐ**: ํ๊น
ํ์ด์ค์ ๊ณต๊ฐ๋ [Junhoee/Jeju-Standard-Translation](https://huggingface.co/datasets/Junhoee/Jeju-Standard-Translation) ๋ฐ์ดํฐ์
์ ํ์ฉํ์ฌ ์ฝ 93๋ง ๊ฐ์ ๋ฌธ์ฅ ์์ผ๋ก ํ์ต๋์์ต๋๋ค.
|
31 |
+
|
32 |
+
## ์ฑ๋ฅ ํ๊ฐ
|
33 |
+
๋ชจ๋ธ์ ์ฑ๋ฅ์ SacreBLEU, CHRF, BERTScore์ ๊ฐ์ ์ ๋์ ์งํ๋ก ํ๊ฐ๋์์ต๋๋ค.
|
34 |
+
|
35 |
+
| ๋ฐฉํฅ | SacreBLEU | CHRF | BERTScore |
|
36 |
+
|-------------------|-----------|------|-----------|
|
37 |
+
| ์ ์ฃผ์ด โ ํ์ค์ด | 77.19 | 83.02| 0.97 |
|
38 |
+
| ํ์ค์ด โ ์ ์ฃผ์ด | 64.86 | 72.68| 0.94 |
|
39 |
+
|
40 |
+
## ์ฌ์ฉ ๋ฐฉ๋ฒ
|
41 |
+
`transformers` ๋ผ์ด๋ธ๋ฌ๋ฆฌ์ `pipeline`์ ์ฌ์ฉํ์ฌ ๋ชจ๋ธ์ ์ฝ๊ฒ ๋ก๋ํ๊ณ ์ถ๋ก ํ ์ ์์ต๋๋ค.
|
42 |
+
|
43 |
+
**1. ๋ผ์ด๋ธ๋ฌ๋ฆฌ ์ค์น**
|
44 |
+
```bash
|
45 |
+
pip install transformers torch
|
46 |
+
|
47 |
+
from transformers import pipeline
|
48 |
+
|
49 |
+
# ๋ชจ๋ธ ํ์ดํ๋ผ์ธ ๋ก๋
|
50 |
+
translator = pipeline(
|
51 |
+
"translation",
|
52 |
+
model="sbaru/jeju-satoru"
|
53 |
+
)
|
54 |
+
|
55 |
+
# ์ ์ฃผ์ด -> ํ์ค์ด ๋ฒ์ญ ์์
|
56 |
+
jeju_sentence = '[์ ์ฃผ] ์ฐ๋ฆฌ ์ง์ด ํ์ํ๋ค.'
|
57 |
+
result = translator(jeju_sentence, max_length=128)
|
58 |
+
print(f"์
๋ ฅ: {jeju_sentence}")
|
59 |
+
print(f"์ถ๋ ฅ: {result[0]['translation_text']}")
|
60 |
+
|
61 |
+
# ํ์ค์ด -> ์ ์ฃผ์ด ๋ฒ์ญ ์์
|
62 |
+
standard_sentence = '[ํ์ค] ์ฐ๋ฆฌ ์ง์ ํธ์ํ๋ค.'
|
63 |
+
result = translator(standard_sentence, max_length=128)
|
64 |
+
print(f"์
๋ ฅ: {standard_sentence}")
|
65 |
+
print(f"์ถ๋ ฅ: {result[0]['translation_text']}")
|