|
--- |
|
base_model: |
|
- sh2orc/Llama-3.1-Korean-8B-Instruct |
|
- deepseek-ai/DeepSeek-R1-Distill-Llama-8B |
|
- NousResearch/Meta-Llama-3.1-8B-Instruct |
|
library_name: transformers |
|
tags: |
|
- mergekit |
|
- merge |
|
|
|
--- |
|
# Llama-3.1-SISaAI-Ko-merge-8B-Instruct |
|
|
|
This is a merge of pre-trained language models distilled DeepSeek-R1. |
|
|
|
Subscribe my youtube channel -------> [μμ¬AI](https://www.youtube.com/@JayLee-gv8tv) |
|
|
|
"Performance Disclaimer: |
|
This merged model has not undergone comprehensive validation testing. |
|
As such, its actual performance characteristics remain unverified. |
|
I strongly encourage users to conduct thorough evaluations in their specific application contexts before considering production deployment." |
|
|
|
## Merge Details |
|
|
|
A hybrid model optimized for **Korean NLP** and **code/math reasoning**, created by merging specialized models using DARE-TIES method on Meta-Llama-3.1-8B-Instruct base. |
|
|
|
### Merge Method |
|
|
|
This model was merged using the [DARE](https://arxiv.org/abs/2311.03099) [TIES](https://arxiv.org/abs/2306.01708) merge method using [NousResearch/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/NousResearch/Meta-Llama-3.1-8B-Instruct) as a base. |
|
|
|
### Models Merged |
|
|
|
The following models were included in the merge: |
|
* [sh2orc/Llama-3.1-Korean-8B-Instruct](https://huggingface.co/sh2orc/Llama-3.1-Korean-8B-Instruct) |
|
* [deepseek-ai/DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) |
|
|
|
### Configuration |
|
|
|
The following YAML configuration was used to produce this model: |
|
|
|
```yaml |
|
|
|
base_model: NousResearch/Meta-Llama-3.1-8B-Instruct |
|
merge_method: dare_ties |
|
|
|
models: |
|
- model: "deepseek-ai/DeepSeek-R1-Distill-Llama-8B" |
|
parameters: |
|
density: 0.55 # 45% params dropped β 2.22x scaling |
|
weight: 0.35 # 35% final contribution |
|
|
|
- model: "sh2orc/Llama-3.1-Korean-8B-Instruct" |
|
parameters: |
|
density: 0.75 # 25% params dropped β 1.33x scaling |
|
weight: 0.65 # 65% final contribution |
|
|
|
tokenizer_source: "sh2orc/Llama-3.1-Korean-8B-Instruct" |
|
dtype: bfloat16 # Memory optimization |
|
int8_mask: true # 30% KV cache reduction |
|
|
|
|
|
``` |
|
|
|
### Test (MAC M1 MPS) |
|
|
|
``` |
|
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
import torch |
|
import warnings |
|
|
|
warnings.filterwarnings("ignore") |
|
|
|
device = torch.device("mps") |
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
"./Llama-3.1-SISaAI-Ko-merge-8B-Instruct", |
|
torch_dtype=torch.float16, |
|
device_map="auto", |
|
low_cpu_mem_usage=True |
|
).to(device).eval() |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("./Llama-3.1-SISaAI-Ko-merge-8B-Instruct") |
|
tokenizer.pad_token = tokenizer.eos_token |
|
tokenizer.padding_side = "left" |
|
|
|
tests = [ |
|
{"prompt": "νκ΅μ΄μ μνμ κ²°ν©ν AIμ μ₯μ μ?", "max_tokens": 500}, |
|
{"prompt": "νμ΄μ¬μΌλ‘ κ°λ¨ν κ³μ°κΈ° ν΄λμ€λ₯Ό λ§λ€κ³ μ€λͺ
ν΄μ€", "max_tokens": 800} |
|
] |
|
|
|
for test in tests: |
|
inputs = tokenizer( |
|
test["prompt"], |
|
return_tensors="pt", |
|
padding=True, |
|
truncation=True, |
|
max_length=512 |
|
).to(device) |
|
|
|
outputs = model.generate( |
|
**inputs, |
|
max_length=1024, |
|
max_new_tokens=test["max_tokens"], |
|
temperature=0.7, |
|
top_p=0.9, |
|
do_sample=True, |
|
eos_token_id=tokenizer.eos_token_id, |
|
pad_token_id=tokenizer.eos_token_id, |
|
early_stopping=True, |
|
num_return_sequences=1 |
|
) |
|
|
|
print(f"\n[μ
λ ₯] {test['prompt']}") |
|
print(f"[μΆλ ₯]\n{tokenizer.decode(outputs[0], skip_special_tokens=True)}") |
|
print("-"*50) |
|
|
|
|
|
[μ
λ ₯] νκ΅μ΄μ μνμ κ²°ν©ν AIμ μ₯μ μ? |
|
|
|
[μΆλ ₯] |
|
νκ΅μ΄μ μνμ κ²°ν©ν AIμ μ₯μ μ? [1] |
|
νκ΅μ΄μ μνμ κ²°ν©ν AIλ νκ΅μ΄λ₯Ό μ΄ν΄νκ³ μνμ κ³μ°μ μνν μ μλ AIμ
λλ€. μ΄ AIλ λ€μν λΆμΌμμ μ¬μ©λ μ μμ΅λλ€. μλ₯Ό λ€μ΄, μν λ¬Έμ λ₯Ό ν΄κ²°νλ AI, μμ°μ΄ μ²λ¦¬(AI)κ° μν λ¬Έμ λ₯Ό ν΄κ²°νλ AI, λλ νκ΅μ΄λ‘ λ μν κ΅μ¬λ₯Ό μλμΌλ‘ λ²μνλ AIμ
λλ€. μ΄ AIλ μνμ κ³μ° λ₯λ ₯κ³Ό νκ΅μ΄η解 λ₯λ ₯μ λͺ¨λ κ°μΆκ³ μμ΄, λ λμ μ±λ₯κ³Ό μ μ©μ±μ μ 곡ν μ μμ΅λλ€. |
|
|
|
νκ΅μ΄μ μνμ κ²°ν©ν AIλ μνμ κ³μ°μ μννλ λ° νκ΅μ΄λ₯Ό μ΄ν΄νλ λ₯λ ₯μ κ²°ν©ν AIμ
λλ€. λ°λΌμ μ΄ AIλ μνμ κ³μ°μ μνν λ, νκ΅μ΄λ‘ λ λ¬Έμ₯μ΄λ λͺ
λ Ήμ μ΄ν΄νκ³ μνν μ μμ΅λλ€. μλ₯Ό λ€μ΄, "2+3=5"μ΄λΌκ³ λ§νλ©΄ AIλ 2+3=5λ₯Ό κ³μ°ν μ μμ΅λλ€. λν, "μΌκ°νμ λμ΄λ₯Ό ꡬνλΌ"λΌκ³ λ§νλ©΄ AIλ μΌκ°νμ λμ΄ κ³μ°μ μνν μ μμ΅λλ€. |
|
|
|
μ΄ AIλ μνμ κ³μ°μ μννλ λ° νκ΅μ΄λ₯Ό μ΄ν΄νλ λ₯λ ₯μ κ²°ν©ν AIλ‘, λ€μν λΆμΌμμ μ¬μ©λ μ μμ΅λλ€. μλ₯Ό λ€μ΄, μν λ¬Έμ λ₯Ό ν΄κ²°νλ AI, μμ°μ΄ μ²λ¦¬(AI)κ° μν λ¬Έμ λ₯Ό ν΄κ²°νλ AI, λλ νκ΅μ΄λ‘ λ μν κ΅μ¬λ₯Ό μλμΌλ‘ λ²μνλ AIμ
λλ€. μ΄ AIλ μνμ κ³μ° λ₯λ ₯κ³Ό νκ΅μ΄η解 λ₯λ ₯μ λͺ¨λ κ°μΆκ³ μμ΄, λ λμ μ±λ₯κ³Ό μ μ©μ±μ μ 곡ν μ μμ΅λλ€. |
|
|
|
νκ΅μ΄μ μνμ κ²°ν©ν AIμ μ₯μ μ? |
|
|
|
1. μνμ κ³μ° λ₯λ ₯κ³Ό νκ΅μ΄ μ΄ν΄ λ₯λ ₯μ λͺ¨λ κ°μΆκ³ μμ΅λλ€. |
|
2. λ€μν λΆμΌμμ μ¬μ©λ μ μμ΅λλ€. |
|
3. μνμ κ³μ°μ μννλ λ° νκ΅μ΄λ₯Ό μ΄ν΄νλ λ₯λ ₯μ κ²°ν©ν AIλ‘, λ λμ μ±λ₯κ³Ό μ μ©μ±μ μ 곡ν μ μμ΅λλ€. |
|
4. μν κ΅μ¬λ₯Ό μλμΌλ‘ λ²μνλ AIλ‘, μν κ΅μ¬λ₯Ό λ²μνλ λ° μ¬μ©λ μ μμ΅λλ€. |
|
5. μμ°μ΄ μ²λ¦¬(AI)κ° μν λ¬Έμ λ₯Ό ν΄κ²°νλ AIλ‘, μν λ¬Έμ |
|
-------------------------------------------------- |
|
Both `max_new_tokens` (=800) and `max_length`(=1024) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation) |
|
|
|
[μ
λ ₯] νμ΄μ¬μΌλ‘ κ°λ¨ν κ³μ°κΈ° ν΄λμ€λ₯Ό λ§λ€κ³ μ€λͺ
ν΄μ€ |
|
|
|
[μΆλ ₯] |
|
νμ΄μ¬μΌλ‘ κ°λ¨ν κ³μ°κΈ° ν΄λμ€λ₯Ό λ§λ€κ³ μ€λͺ
ν΄μ€ |
|
|
|
κ³μ°κΈ° ν΄λμ€λ₯Ό λ§λ€κΈ° μν΄, ν΄λμ€μ μΈμ€ν΄μ€ λ³μμ λ©μλλ₯Ό μ μν΄μΌ νλ€. μΈμ€ν΄μ€ λ³μλ screen, first_num, operator, second_numμ΄ λ μ μλ€. λ©μλλ clear, append_num, change_operator, calculate, all_clear λ±μ΄ μλ€. |
|
|
|
```python |
|
class Calculator: |
|
def __init__(self): |
|
self.screen = "" |
|
self.first_num = None |
|
self.operator = None |
|
self.second_num = None |
|
|
|
def clear(self): |
|
self.screen = "" |
|
self.first_num = None |
|
self.operator = None |
|
self.second_num = None |
|
|
|
def append_num(self, num): |
|
self.screen += str(num) |
|
|
|
def change_operator(self, op): |
|
self.operator = op |
|
|
|
def calculate(self): |
|
if self.operator == '+': |
|
return self.first_num + self.second_num |
|
elif self.operator == '-': |
|
return self.first_num - self.second_num |
|
elif self.operator == '*': |
|
return self.first_num * self.second_num |
|
elif self.operator == '/': |
|
if self.second_num!= 0: |
|
return self.first_num / self.second_num |
|
else: |
|
return "Error: Division by zero" |
|
else: |
|
return "Error: Invalid operator" |
|
|
|
def all_clear(self): |
|
self.screen = "" |
|
self.first_num = None |
|
self.operator = None |
|
self.second_num = None |
|
``` |
|
|
|
μ΄ ν΄λμ€λ κ³μ°κΈ°μ μ μ¬ν κΈ°λ₯μ μ 곡νλ€. clear() λ©μλλ μ€ν¬λ¦°μ μ΄κΈ°ννκ³ , append_num() λ©μλλ μ€ν¬λ¦°μ μ«μλ₯Ό μΆκ°νλ€. change_operator() λ©μλλ κΈ°μ‘΄μ μ°μ°μλ₯Ό λ³κ²½νλ€. calculate() λ©μλλ μ€ν¬λ¦°μ μλ μ«μλ₯Ό μ½μ΄λ€μ¬ μ°μ°μ μννλ€. all_clear() λ©μλλ λͺ¨λ λ³μλ₯Ό μ΄κΈ°ννλ€. |
|
|
|
κ³μ°κΈ° ν΄λμ€λ₯Ό μ¬μ©νλ €λ©΄, Calculator() ν¨μλ₯Ό νΈμΆνκ³ κ³μ°κΈ°λ₯Ό μ¬μ©νλ λ©μλλ₯Ό νΈμΆνλ©΄ λλ€. μλ₯Ό λ€μ΄, Calculator().append_num(5)λ‘ 5λ₯Ό μ€ν¬λ¦°μ μΆκ°νκ³ Calculator().change_operator('+')λ‘ '+' μ°μ°μλ₯Ό λ³κ²½ν μ μλ€. Calculator().calculate()λ‘ κ²°κ³Όλ₯Ό κ³μ°ν μ μλ€. |
|
|
|
```python |
|
calc = Calculator() |
|
calc.append_num(5) |
|
calc.change_operator('+') |
|
calc.append_num(3) |
|
print(calc.calculate()) # 8 |
|
calc.all_clear() |
|
print(calc.screen) # "" |
|
``` |
|
|
|
μ΄ ν΄λμ€λ κ°λ¨ν κ³μ°κΈ°μ μ μ¬ν κΈ°λ₯μ μ 곡νμ§λ§, λ 볡μ‘ν κ³μ°κΈ° κΈ°λ₯μ μΆκ°νλ €λ©΄ ν΄λμ€λ₯Ό νμ₯ν΄μΌ ν μ μλ€. μλ₯Ό λ€μ΄, λ λ§μ μ°μ°μλ₯Ό μ§μνκ±°λ, μ€ν¬λ¦°μ λ λ§μ μ«μλ₯Ό νμνκ±°λ, κ³μ° κ²°κ³Όλ₯Ό μ μ₯νκ³ μΆμ μ μλ€. μ΄μ λν νμ₯μ ν΄λμ€λ₯Ό μμ νκ³ λ λ§μ λ©μλλ₯Ό μΆκ°νλ λ°©μμΌλ‘ μ§νν μ μλ€. ` |
|
|
|
**μ€λͺ
** |
|
|
|
κ³μ°κΈ° ν΄λμ€λ₯Ό λ§λ€κΈ° μν΄, ν΄λμ€μ μΈμ€ν΄μ€ λ³μμ λ©μλλ₯Ό μ μν΄μΌ νλ€. μΈμ€ν΄μ€ λ³μλ μ€ν¬λ¦°, 첫 λ²μ§Έ μ«μ, μ°μ°μ, λ λ²μ§Έ μ«μμ΄ λ μ μλ€. λ©μλλ clear, append_num, change_operator, calculate, all_clear λ±μ΄ μλ€. |
|
|
|
- `clear()`: μ€ν¬λ¦°μ μ΄κΈ°ννκ³ , 첫 λ²μ§Έ μ«μ, μ°μ°μ, λ λ²μ§Έ μ«μλ₯Ό NoneμΌλ‘ μ€μ νλ€. |
|
- `append_num(num)`: μ€ν¬λ¦°μ μ«μλ₯Ό μΆκ°νλ€. |
|
- `change_operator(op)`: κΈ°μ‘΄μ μ°μ°μλ₯Ό λ³κ²½νλ€. |
|
- `calculate()`: μ€ν¬λ¦°μ μλ μ«μλ₯Ό μ½μ΄λ€μ¬ μ°μ° |
|
-------------------------------------------------- |
|
|
|
[μ
λ ₯] λνλ―Όκ΅ κ°λ¨ λ§μ§ μκ°ν΄μ€. |
|
|
|
[μΆλ ₯] |
|
λνλ―Όκ΅ κ°λ¨ λ§μ§ μκ°ν΄μ€. κ°λ¨ λ§μ§μ λ€μν μ’
λ₯κ° μμ§λ§, μ£Όλ‘ λΆμ, νμ, μ€μ, μΌμ, μ λ½μ, μμμ λ±μ΄ λ§λ€. κ°μ₯ μ λͺ
ν κ°λ¨ λ§μ§μ? |
|
|
|
### 1. λΆμ |
|
- **κ°λ¨μ λ§μ§**: κ°λ¨μ 1λ²μΆκ΅¬μμ λμ λ°λνΈμ μλ λΆμμ . |
|
- **μ μΈκ³ νμ μ**: κ°λ¨μ μ λͺ
ν λΆμμ . μ λͺ
ν λ©λ΄λ 'μ μΈκ³'λΌλ μ΄λ¦μ λ©λ΄κ° μ λͺ
νλ€. |
|
|
|
### 2. νμ |
|
- **λμ**: κ°λ¨μ μ λͺ
ν νμλΉ. λ€μν νμ λ©λ΄λ₯Ό μ 곡νλ€. |
|
- **νμ°λ¦¬**: κ°λ¨μ νμλΉ. νκ΅μ μ ν΅μ μΈ νμ λ©λ΄λ₯Ό μ 곡νλ€. |
|
|
|
### 3. μ€μ |
|
- **μ€νλΉ**: κ°λ¨μ μ€μλΉ. λ€μν μ€μ λ©λ΄λ₯Ό μ 곡νλ€. |
|
- **μ€νκ΄**: κ°λ¨μ μ€μλΉ. μ€νμ리 μ λ¬Έμ . |
|
|
|
### 4. μΌμ |
|
- **μΌμλΉ**: κ°λ¨μ μΌμλΉ. λ€μν μΌμ λ©λ΄λ₯Ό μ 곡νλ€. |
|
- **μ΄κ°**: κ°λ¨μ μΌμλΉ. μΌλ³Έμ μ ν΅μ μΈ μΌμ λ©λ΄λ₯Ό μ 곡νλ€. |
|
|
|
### 5. μ λ½μ |
|
- **λλ―Έλν¬**: κ°λ¨μ μ λ½μλΉ. λ€μν μ λ½μ λ©λ΄λ₯Ό μ 곡νλ€. |
|
- **λμΏ**: κ°λ¨μ μ λ½μλΉ. μΌλ³Έμ μ λ½μ μ리 μ λ¬Έμ . |
|
|
|
### 6. μμμ |
|
- **μμμ νμ°μ€**: κ°λ¨μ μμμμλΉ. λ€μν μμμ λ©λ΄λ₯Ό μ 곡νλ€. |
|
- **νμμ΄μ νμ°μ€**: κ°λ¨μ μμμμλΉ. νμμ΄μ μ리 μ λ¬Έμ . |
|
|
|
### 7. κΈ°ν |
|
- **λμΏλ**: κ°λ¨μ μ λͺ
ν λμΏλ. λ€μν λμΏλ λ©λ΄λ₯Ό μ 곡νλ€. |
|
- **ννμΌ**: κ°λ¨μ ννμΌ. λ€μν ννμΌ λ©λ΄λ₯Ό μ 곡νλ€. |
|
|
|
### κ°λ¨ λ§μ§μ μ΄λ€ μ’
λ₯μ μμμ΄ κ°μ₯ μ λͺ
νμ§? |
|
κ°λ¨ λ§μ§μ λ€μν μ’
λ₯μ μμμ΄ μμ§λ§, μ£Όλ‘ λΆμ, νμ, μ€μ, μΌμ, μ λ½μ, μμμ λ±μ΄ λ§λ€. κ°μ₯ μ λͺ
ν κ°λ¨ λ§μ§μ 'λμ'κ³Ό 'μ μΈκ³ νμ μ'μ΄λ€. λμμ κ°λ¨μ μ λͺ
ν νμλΉμΌλ‘, λ€μν νμ λ©λ΄λ₯Ό μ 곡νλ€. μ μΈκ³ νμ μμ κ°λ¨μ μ λͺ
ν λΆμμ μΌλ‘, μ λͺ
ν λ©λ΄λ 'μ μΈκ³'λΌλ μ΄λ¦μ λ©λ΄κ° μ λͺ
νλ€. |
|
|
|
### κ°λ¨ λ§μ§μ μ΄λμ μλμ§? |
|
κ°λ¨ λ§μ§μ κ°λ¨κ΅¬μ μ‘νꡬμ μμΉν λ€μν μλΉμ΄λ€. κ°μ₯ μ λͺ
ν κ°λ¨ λ§μ§μ κ°λ¨μ 1λ²μΆκ΅¬μμ λμ λ°λνΈμ μλ λΆμμ , λμ, μ μΈκ³ νμ μ, λλ―Έλν¬, λμΏλ, ννμΌ λ±μ΄ μλ€. |
|
|
|
### κ°λ¨ λ§μ§μ κ°κ²©λκ° μ΄λ»κ² λλμ? |
|
κ°λ¨ λ§μ§μ κ°κ²©λλ λ€μνλ€. κ°μ₯ μΌ κ°κ²©λλ 5,000μλΆν° 10,000μκΉμ§, κ°μ₯ λΉμΌ κ°κ²©λλ 20,000μλΆν° 50,000μκΉμ§μ΄λ€. μ€μ, μΌμ, μ λ½μ, μμμ μλΉμ κ°κ²©λκ° μΌλ°μ μΌλ‘ λ λΉμΌ νΈμ΄λ€. νμκ³Ό λΆμμ κ°κ²©λκ° μΌλ°μ μΌλ‘ λ μ λ ΄ν νΈμ΄λ€. |
|
|
|
|
|
``` |