ko-llm
Collection
2 items
โข
Updated
davidkim205/ko-gemma-2-9b-it is one of several models being researched to improve the performance of Korean language models.
(would be released soon)
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
model_id = "davidkim205/ko-gemma-2-9b-it"
quantization_config = BitsAndBytesConfig(load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=quantization_config)
chat = [
{ "role": "system", "content":"๋น์ ์ ์ง๋ฌธ์ ๋ํด์ ์์ธํ ์ค๋ช
ํ๋ AI์
๋๋ค."},
{ "role": "user", "content": "๋ฅ๋ฌ๋์ ์ด๋ป๊ฒ ๊ณต๋ถํด์ผํ๋์?" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=1024)
print(tokenizer.decode(outputs[0]))
output
`low_cpu_mem_usage` was None, now set to True since model is quantized.
Loading checkpoint shards: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 4/4 [00:04<00:00, 1.04s/it]
/home/david/anaconda3/envs/eval/lib/python3.10/site-packages/bitsandbytes/nn/modules.py:426: UserWarning: Input type into Linear4bit is torch.float16, but bnb_4bit_compute_dtype=torch.float32 (default). This will lead to slow inference or training speed.
warnings.warn(
<bos>๋น์ ์ ์ง๋ฌธ์ ๋ํด์ ์์ธํ ์ค๋ช
ํ๋ AI์
๋๋ค.<start_of_turn>user
๋ฅ๋ฌ๋์ ์ด๋ป๊ฒ ๊ณต๋ถํด์ผํ๋์?<end_of_turn>
<start_of_turn>model
๋ฅ๋ฌ๋์ ๊ณต๋ถํ๋ ๊ฒ์ ํฅ๋ฏธ๋กญ๊ณ ๋ณด๋ ์๋ ์ฌ์ ์ด ๋ ์ ์์ต๋๋ค!
ํ์ง๋ง ์ด๋์๋ถํฐ ์์ํด์ผ ํ ์ง ๋ง๋งํ๊ฒ ๋๊ปด์ง ์๋ ์์ต๋๋ค.
๋ค์์ ๋ฅ๋ฌ๋์ ๊ณต๋ถํ๊ธฐ ์ํ ๋จ๊ณ๋ณ ๊ฐ์ด๋์
๋๋ค.
**1๋จ๊ณ: ๊ธฐ์ด ๋ค์ง๊ธฐ**
* **์ํ**: ๋ฅ๋ฌ๋์ ๊ธฐ๋ฐ์ด ๋๋ ์ ํ๋์, ๋ฏธ์ ๋ถ, ํ๋ฅ ๋ฐ ํต๊ณ์ ๋ํ ๊ธฐ๋ณธ ์ง์์ด ํ์ํฉ๋๋ค. Khan Academy, Coursera ๋ฑ ์จ๋ผ์ธ ํ๋ซํผ์์ ์ํ ๊ฐ์ข๋ฅผ ๋ฃ๋ ๊ฒ์ ์ถ์ฒํฉ๋๋ค.
* **ํ๋ก๊ทธ๋๋ฐ**: Python์ ๋ฅ๋ฌ๋ ๋ถ์ผ์์ ๊ฐ์ฅ ๋๋ฆฌ ์ฌ์ฉ๋๋ ํ๋ก๊ทธ๋๋ฐ ์ธ์ด์
๋๋ค. Python ๊ธฐ์ด ๋ฌธ๋ฒ, ๋ฐ์ดํฐ ๊ตฌ์กฐ, ํจ์ ๋ฑ์ ์ตํ์ธ์. Codecademy, Google's Python Class ๋ฑ์ ํ๋ซํผ์์ Python์ ๋ฐฐ์ธ ์ ์์ต๋๋ค.
* **๊ธฐ๋ณธ ๋จธ์ ๋ฌ๋**: ๋ฅ๋ฌ๋์ ์ดํดํ๊ธฐ ์ ์ ๊ธฐ๋ณธ์ ์ธ ๋จธ์ ๋ฌ๋ ๊ฐ๋
์ ์ตํ๋ ๊ฒ์ด ์ค์ํฉ๋๋ค.
* ๋ถ๋ฅ, ํ๊ท, ํด๋ฌ์คํฐ๋ง ๋ฑ์ ๋จธ์ ๋ฌ๋ ์๊ณ ๋ฆฌ์ฆ์ ์ดํดํ๊ณ , Scikit-learn ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ฅผ ํ์ฉํ์ฌ ์ค์ต์ ํด๋ณด์ธ์.
**2๋จ๊ณ: ๋ฅ๋ฌ๋ ๊ฐ๋
ํ์ต**
* **์จ๋ผ์ธ ๊ฐ์ข**: Coursera, edX, Udacity ๋ฑ์ ํ๋ซํผ์์ ์ ๊ณตํ๋ ๋ฅ๋ฌ๋ ๊ฐ์ข๋ฅผ ์๊ฐํ์ธ์. Andrew Ng์ Deep Learning Specialization์ ๋ฅ๋ฌ๋ ๋ถ์ผ์ ๊ธฐ๋ณธ ๊ฐ๋
์ ํํํ๊ฒ ๋ค์ง๋ ๋ฐ ์ข์ ์ ํ์
๋๋ค.
* **์ฑ
**: ๋ฅ๋ฌ๋์ ๋ํ ์ดํด๋ฅผ ์ฌํ์ํค๊ธฐ ์ํด ์ฑ
์ ์ฝ๋ ๊ฒ๋ ์ข์ ๋ฐฉ๋ฒ์
๋๋ค.
* "Deep Learning" (Ian Goodfellow, Yoshua Bengio, Aaron Courville)์ ๋ฅ๋ฌ๋ ๋ถ์ผ์ ์ ๋ฌธ๊ฐ๋ฅผ ์ํ ์ฌ๋ ์๋ ์ฑ
์
๋๋ค.
* "Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow" (Aurรฉlien Gรฉron)์ ์ค์ต ์ค์ฌ์ผ๋ก ๋ฅ๋ฌ๋์ ๋ฐฐ์ฐ๊ณ ์ถ์ ์ฌ๋์๊ฒ ์ ํฉํฉ๋๋ค.
* **๋ธ๋ก๊ทธ ๋ฐ ๊ธฐ์ฌ**: ๋ฅ๋ฌ๋ ๊ด๋ จ ์ต์ ํธ๋ ๋์ ์ฐ๊ตฌ ๋ํฅ์ ํ์
ํ๊ธฐ ์ํด ๋ธ๋ก๊ทธ ๋ฐ ๊ธฐ์ฌ๋ฅผ ์ฝ๋ ๊ฒ์ด ์ข์ต๋๋ค.
**3๋จ๊ณ: ์ค์ต ๋ฐ ํ๋ก์ ํธ ์งํ**
* **๋ฐ์ดํฐ์
**: Kaggle, UCI Machine Learning Repository ๋ฑ์ ํ๋ซํผ์์ ๋ค์ํ ๋ฐ์ดํฐ์
์ ์ฐพ์ ์ค์ตํ ์ ์์ต๋๋ค.
* **๋ผ์ด๋ธ๋ฌ๋ฆฌ**: TensorFlow, PyTorch, Keras ๋ฑ์ ๋ฅ๋ฌ๋ ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ฅผ ํ์ฉํ์ฌ ๋ชจ๋ธ์ ๊ตฌ์ถํ๊ณ ํ๋ จํ์ธ์.
* **ํ๋ก์ ํธ**: ๋ฅ๋ฌ๋ ๊ธฐ์ ์ ์ ์ฉํ์ฌ ์ค์ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๋ ํ๋ก์ ํธ๋ฅผ ์งํํ๋ ๊ฒ์ด ์ค์ํฉ๋๋ค.
* ์ด๋ฏธ์ง ๋ถ๋ฅ, ์์ฐ์ด ์ฒ๋ฆฌ, ์์ธก ๋ชจ๋ธ ๊ฐ๋ฐ ๋ฑ ๋ค์ํ ํ๋ก์ ํธ๋ฅผ ํตํด ๋ฅ๋ฌ๋ ์ค๋ ฅ์ ํฅ์์ํฌ ์ ์์ต๋๋ค.
**์ถ๊ฐ ํ**
* **์ปค๋ฎค๋ํฐ ํ๋**: ๋ฅ๋ฌ๋ ๊ด๋ จ ์ปค๋ฎค๋ํฐ์ ์ฐธ์ฌํ์ฌ ๋ค๋ฅธ ์ฌ๋๋ค๊ณผ ๊ต๋ฅํ๊ณ ์ง๋ฌธ์ ํด๋ณด์ธ์.
* **๊พธ์คํจ**: ๋ฅ๋ฌ๋์ ๋ณต์กํ ๋ถ์ผ์ด๋ฏ๋ก ๊พธ์คํ ๊ณต๋ถํ๊ณ ์ค์ตํ๋ ๊ฒ์ด ์ค์ํฉ๋๋ค.
<end_of_turn><eos>
https://github.com/davidkim205/kollm_evaluation
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
kobest | N/A | none | 0 | acc | 0.5150 | ยฑ | 0.0073 |
none | 0 | f1 | 0.4494 | ยฑ | N/A | ||
- kobest_boolq | 1 | none | 0 | acc | 0.6154 | ยฑ | 0.0130 |
none | 0 | f1 | 0.5595 | ยฑ | N/A | ||
- kobest_copa | 1 | none | 0 | acc | 0.4710 | ยฑ | 0.0158 |
none | 0 | f1 | 0.4700 | ยฑ | N/A | ||
- kobest_hellaswag | 1 | none | 0 | acc | 0.3880 | ยฑ | 0.0218 |
none | 0 | f1 | 0.3832 | ยฑ | N/A | ||
none | 0 | acc_norm | 0.4780 | ยฑ | 0.0224 | ||
- kobest_sentineg | 1 | none | 0 | acc | 0.5189 | ยฑ | 0.0251 |
none | 0 | f1 | 0.4773 | ยฑ | N/A | ||
- kobest_wic | 1 | none | 0 | acc | 0.4873 | ยฑ | 0.0141 |
none | 0 | f1 | 0.3276 | ยฑ | N/A | ||
ko_truthfulqa | 2 | none | 0 | acc | 0.3390 | ยฑ | 0.0166 |
ko_mmlu | 1 | none | 0 | acc | 0.1469 | ยฑ | 0.0019 |
none | 0 | acc_norm | 0.1469 | ยฑ | 0.0019 | ||
ko_hellaswag | 1 | none | 0 | acc | 0.2955 | ยฑ | 0.0046 |
none | 0 | acc_norm | 0.3535 | ยฑ | 0.0048 | ||
ko_common_gen | 1 | none | 0 | acc | 0.5825 | ยฑ | 0.0126 |
none | 0 | acc_norm | 0.5825 | ยฑ | 0.0126 | ||
ko_arc_easy | 1 | none | 0 | acc | 0.2329 | ยฑ | 0.0124 |
none | 0 | acc_norm | 0.2867 | ยฑ | 0.0132 |
keval is an evaluation model that learned the prompt and dataset used in the benchmark for evaluating Korean language models among various methods of evaluating models with chatgpt to compensate for the shortcomings of the existing lm-evaluation-harness.
https://huggingface.co/davidkim205/keval-7b
model | ned | exe_time | evalscore | count |
---|---|---|---|---|
claude-3-opus-20240229 | nan | nan | 8.79 | 42 |
gpt-4-turbo-2024-04-09 | nan | nan | 8.71 | 42 |
Qwen2-72B-Instruct | nan | 29850.5 | 7.85 | 42 |
WizardLM-2-8x22B | nan | 133831 | 7.57 | 42 |
ko-gemma-2-9b-it | nan | 30789.5 | 7.52 | 42 |
HyperClovaX | nan | nan | 7.44 | 42 |
gemma-2-9b-it | nan | 23531.7 | 7.4 | 42 |
glm-4-9b-chat | nan | 24825.6 | 7.31 | 42 |
Ko-Llama-3-8B-Instruct | nan | 10697.5 | 6.81 | 42 |
Qwen2-7B-Instruct | nan | 11856.3 | 6.02 | 42 |
Not-WizardLM-2-7B | nan | 12955.7 | 5.26 | 42 |
gemma-1.1-7b-it | nan | 6950.5 | 4.99 | 42 |
Mistral-7B-Instruct-v0.3 | nan | 19631.4 | 4.89 | 42 |
Phi-3-small-128k-instruct | nan | 26747.5 | 3.52 | 42 |