Qwen2.5-3B-MATH-GRPO-KOR

๋ชจ๋ธ ๊ฐœ์š”

ํ•œ๊ตญ์–ด ์ˆ˜ํ•™ ์ถ”๋ก ์— ํŠนํ™”๋œ Qwen2.5-3B ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. GRPO(Group Relative Policy Optimization)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•œ๊ตญ์–ด ์ˆ˜ํ•™ ๋ฌธ์ œ ํ•ด๊ฒฐ ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค.

์ฃผ์š” ํŠน์ง•

  • ๋ฒ ์ด์Šค ๋ชจ๋ธ: Qwen/Qwen2.5-3B-Instruct
  • ํ•™์Šต ๋ฐฉ๋ฒ•: GRPO (Group Relative Policy Optimization)
  • ๋ฐ์ดํ„ฐ์…‹: ChuGyouk/AI-MO-NuminaMath-CoT-Ko (5,000 ์ƒ˜ํ”Œ)
  • ์–ธ์–ด: ํ•œ๊ตญ์–ด
  • ํŠนํ™” ๋ถ„์•ผ: ์ˆ˜ํ•™ ๋ฌธ์ œ ํ•ด๊ฒฐ ๋ฐ ์ถ”๋ก 

์‚ฌ์šฉ๋ฒ•

from unsloth import FastLanguageModel

# ๋ชจ๋ธ ๋กœ๋“œ
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="byh711/Qwen2.5-3B-MATH-GRPO-KOR",
    max_seq_length=1024,
    load_in_4bit=True,
)

# ์ถ”๋ก  ๋ชจ๋“œ
FastLanguageModel.for_inference(model)

# ํ”„๋กฌํ”„ํŠธ ์„ค์ •
system_prompt = '''๋‹ค์Œ ํ˜•์‹์œผ๋กœ ์ •ํ™•ํžˆ ๋‹ต๋ณ€ํ•ด์ฃผ์„ธ์š”:
<reasoning>
๋‹จ๊ณ„๋ณ„ ํ’€์ด ๊ณผ์ •์„ ์ž์„ธํžˆ ์„ค๋ช…
</reasoning>
<answer>
์ตœ์ข… ๋‹ต์•ˆ
</answer>'''

# ์ˆ˜ํ•™ ๋ฌธ์ œ ์ž…๋ ฅ
question = "์–ด๋–ค ์ˆ˜์˜ 8๋ฐฐ๊ฐ€ 120๋ณด๋‹ค ์ž‘์„ ๋•Œ, ๊ทธ ์ˆ˜์˜ ์ตœ๋Œ€ ์ •์ˆ˜๋ฅผ ๊ตฌํ•˜์„ธ์š”."

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": question}
]

# ํ† ํฐํ™” ๋ฐ ์ƒ์„ฑ
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(input_ids=inputs, max_new_tokens=200, temperature=0.1)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

์ถœ๋ ฅ ํ˜•์‹

๋ชจ๋ธ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ XML ํ˜•์‹์œผ๋กœ ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค:

<reasoning>
1. ์ฃผ์–ด์ง„ ์กฐ๊ฑด: ์–ด๋–ค ์ˆ˜๋ฅผ x๋ผ๊ณ  ํ•˜๋ฉด, 8x < 120
2. ๋ถ€๋“ฑ์‹ ํ’€์ด: x < 120/8 = 15
3. x๋Š” 15๋ณด๋‹ค ์ž‘์•„์•ผ ํ•˜๋ฏ€๋กœ, ์ตœ๋Œ€ ์ •์ˆ˜๋Š” 14
</reasoning>
<answer>
14
</answer>

ํ•™์Šต ์„ธ๋ถ€์‚ฌํ•ญ

  • ํ•™์Šต ๋ฐ์ดํ„ฐ: 5,000๊ฐœ ํ•œ๊ตญ์–ด ์ˆ˜ํ•™ ๋ฌธ์ œ
  • ์—ํฌํฌ: 2
  • ๋ฐฐ์น˜ ํฌ๊ธฐ: 4 (ํšจ๊ณผ์ )
  • ํ•™์Šต๋ฅ : 5e-6
  • ์˜ตํ‹ฐ๋งˆ์ด์ €: AdamW 8bit
  • LoRA Rank: 64

๋ผ์ด์„ ์Šค

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Model tree for byh711/Qwen2.5-3B-MATH-GRPO-KOR

Base model

Qwen/Qwen2.5-3B
Finetuned
(641)
this model

Dataset used to train byh711/Qwen2.5-3B-MATH-GRPO-KOR