davidkim205's picture
Update README.md
bf0838a verified
---
library_name: transformers
license: llama3
language:
- ko
- en
pipeline_tag: text-generation
---
# davidkim205/Ko-Llama-3-8B-Instruct
Ko-Llama-3-8B-Instruct is one of several models being researched to improve the performance of Korean language models.
This model was created using the REJECTION SAMPLING technique to create a data set and then trained using Supervised Fine Tuning.
## Model Details
* **Model Developers** : davidkim(changyeon kim)
* **Repository** : -
* **base mode** : meta-llama/Meta-Llama-3-8B-Instruct
* **sft dataset** : sft_rs_140k
## Requirements
If the undefined symbol error below occurs, install torch as follows.
```
...
RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
/home/david/anaconda3/envs/spaces/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi
```
```
pip install torch==2.2.0
pip install flash-attn==2.5.9.post1
```
## How to use
```
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "davidkim205/Ko-Llama-3-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
while True:
prompt = input('>')
messages = [
{"role": "system", "content": "๋‹น์‹ ์€ ๊ตฌ์ฒด์ ์œผ๋กœ ๋‹ต๋ณ€ํ•˜๋Š” ์ฑ—๋ด‡์ž…๋‹ˆ๋‹ค."},
{"role": "user", "content": prompt},
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = model.generate(
input_ids,
max_new_tokens=1024,
eos_token_id=terminators,
do_sample=True,
temperature=0.6,
top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))
```
```
์‚ฌ๊ณผ์˜ ์˜๋ฏธ๋ฅผ ์„ค๋ช…ํ•˜์‹œ์˜ค
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.
์‚ฌ๊ณผ๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ๋ง›๊ณผ ์˜์–‘๊ฐ€ ์žˆ๋Š” ๊ณผ์ผ๋กœ ์•Œ๋ ค์ ธ ์žˆ์Šต๋‹ˆ๋‹ค. ์‚ฌ๊ณผ๋Š” ์‹ ์„ ํ•œ ์ƒํƒœ์—์„œ ์ฃผ๋กœ ๋จน๊ฑฐ๋‚˜, ์š”๊ฑฐํŠธ๋‚˜ ์Šค๋ฌด๋”” ๋“ฑ์˜ ์Œ๋ฃŒ์— ํ˜ผํ•ฉํ•˜์—ฌ ์„ญ์ทจ๋˜๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ์‚ฌ๊ณผ๋Š” ๋‹ค์–‘ํ•œ ์ข…๋ฅ˜๊ฐ€ ์žˆ์œผ๋ฉฐ, ๊ฐ๊ฐ์˜ ์ข…๋ฅ˜๋Š” ๋‹ค๋ฅธ ์ƒ‰์ƒ๊ณผ ๋ง›์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
์‚ฌ๊ณผ๋Š” ๊ณผ์ผ์ด์ง€๋งŒ, ์ข…์ข… ๋‹ค๋ฅธ ์˜๋ฏธ๋กœ๋„ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, "์‚ฌ๊ณผ"๋ผ๋Š” ๋‹จ์–ด๋Š” ์–ด๋–ค ๊ฒƒ์ด ์ž˜๋ชป๋˜๊ฑฐ๋‚˜ ๋ถ€์กฑํ•œ ๊ฒƒ์„ ์‹œ์‚ฌํ•˜๋Š” ์ƒํ™ฉ์—์„œ ์‚ฌ์šฉ๋  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, "์‚ฌ๊ณผ"๋ฅผ ์ฃผ๋Š” ๊ฒƒ์€ ์ž˜๋ชป๋œ ํ–‰๋™์ด๋‚˜ ๋ถ€์กฑํ•œ ์‚ฌ๊ณ ๋กœ ์ธํ•œ ์‚ฌ๊ณผ๋ฅผ ์˜๋ฏธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋˜ํ•œ, "์‚ฌ๊ณผ"๋Š” ์–ด๋–ค ์ƒํ™ฉ์—์„œ ๋‹ค๋ฅธ ์‚ฌ๋žŒ์—๊ฒŒ์„œ ์‚ฌ๊ณผ๋ฅผ ๋ฐ›๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•˜๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, "์‚ฌ๊ณผ"๋ฅผ ๊ตฌํ•˜์ง€ ์•Š์œผ๋ฉด ์–ด๋–ค ์ƒํ™ฉ์—์„œ ๋‹ค๋ฅธ ์‚ฌ๋žŒ์—๊ฒŒ์„œ ์‚ฌ๊ณผ๋ฅผ ๋ฐ›์ง€ ๋ชปํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.
๋”ฐ๋ผ์„œ, "์‚ฌ๊ณผ"๋Š” ๋‹ค์–‘ํ•œ ์˜๋ฏธ๋กœ ์‚ฌ์šฉ๋˜๋Š” ๋‹จ์–ด์ด๋ฉฐ, ๋งฅ๋ฝ์— ๋”ฐ๋ผ ๋‹ค๋ฅธ ์˜๋ฏธ๋ฅผ ๊ฐ€์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
```
## Benchmark
### kollm_evaluation
https://github.com/davidkim205/kollm_evaluation
| task | acc |
| :--------------- | ---: |
| average | 0.47 |
| kobest | 0.54 |
| kobest_boolq | 0.57 |
| kobest_copa | 0.62 |
| kobest_hellaswag | 0.42 |
| kobest_sentineg | 0.57 |
| kobest_wic | 0.49 |
| ko_truthfulqa | 0.29 |
| ko_mmlu | 0.34 |
| ko_hellaswag | 0.36 |
| ko_common_gen | 0.76 |
| ko_arc_easy | 0.33 |
### Evaluation of KEval
keval is an evaluation model that learned the prompt and dataset used in the benchmark for evaluating Korean language models among various methods of evaluating models with chatgpt to compensate for the shortcomings of the existing lm-evaluation-harness.
https://huggingface.co/davidkim205/keval-7b
| keval | average | kullm | logickor | wandb |
| ---------------------------------- | ------- | ----- | -------- | ----- |
| openai/gpt-4 | 6.79 | 4.66 | 8.51 | 7.21 |
| openai/gpt-3.5-turbo | 6.25 | 4.48 | 7.29 | 6.99 |
| davidkim205/Ko-Llama-3-8B-Instruct | 5.59 | 4.24 | 6.46 | 6.06 |
### Evaluation of ChatGPT
| chatgpt | average | kullm | logickor | wandb |
| ---------------------------------- | ------- | ----- | -------- | ----- |
| openai/gpt-4 | 7.30 | 4.57 | 8.76 | 8.57 |
| openai/gpt-3.5-turbo | 6.53 | 4.26 | 7.5 | 7.82 |
| davidkim205/Ko-Llama-3-8B-Instruct | 5.45 | 4.22 | 6.49 | 5.64 |