|
|
--- |
|
|
license: cc-by-nc-4.0 |
|
|
language: |
|
|
- ko |
|
|
base_model: |
|
|
- google/gemma-3-4b-it |
|
|
pipeline_tag: text-generation |
|
|
library_name: transformers |
|
|
tags: |
|
|
- exam |
|
|
- question-generation |
|
|
- gemma-3 |
|
|
- korean |
|
|
- xml |
|
|
- sft |
|
|
- dpo |
|
|
- grpo |
|
|
--- |
|
|
|
|
|
# Gemma3 ExamGen (Korean, XML) |
|
|
|
|
|
**TL;DR**: A Gemma-3โbased model fine-tuned to generate **Korean** university-level exam questions in **strict XML** (5 problems: 2 MCQ, 2 short-answer, 1 essay). |
|
|
> **Outputs are in Korean.** |
|
|
|
|
|
--- |
|
|
|
|
|
## Overview |
|
|
Gemma3 ExamGen is a fine-tuned variant of Gemma-3 designed to generate Korean university exam questions in a strict XML structure. |
|
|
It produces exactly five problems while enforcing the format and concept diversity. |
|
|
|
|
|
--- |
|
|
|
|
|
## Intended Use |
|
|
- **Primary** : Generate Korean exam problems in XML. |
|
|
- **Output Language** : Korean only. |
|
|
- **Not for** : factual certification, grading, or unreviewed deployment. |
|
|
|
|
|
--- |
|
|
|
|
|
## Training Pipeline |
|
|
- **Base** : `google/gemma-3-4b-it` |
|
|
- **Stages** : SFT โ DPO โ GRPO |
|
|
- **Method** : LoRA fine-tuning |
|
|
- **Data** : PDF-crawled educational materials (private) |
|
|
- **Filtering** : ensured XML validity and unique concepts. |
|
|
|
|
|
--- |
|
|
|
|
|
## Prompting Spec (Korean Prompt Template) |
|
|
|
|
|
> The model must always produce **Korean outputs**. |
|
|
> It strictly follows the XML schema and rules defined below. |
|
|
> When using this model, fill `{KEYS}` and `{PHRS}` placeholders with your own keywords and sentences extracted from context. |
|
|
|
|
|
--- |
|
|
|
|
|
### Prompt Template (in Korean) |
|
|
|
|
|
```text |
|
|
๋ค์์ ๊ท์น์ ์ค์ํ์ฌ ๋ํ๊ต ์ํ ๋ฌธ์ 5๊ฐ๋ฅผ XML ํ์์ผ๋ก ์์ฑํ์ธ์. |
|
|
|
|
|
**์๋ต ํ์ (๋ฐ๋์ ์ค์):** |
|
|
<problems> |
|
|
<problem> |
|
|
<number>1</number> |
|
|
<type>๊ฐ๊ด์</type> |
|
|
<content>๋ฌธ์ ๋ด์ฉ</content> |
|
|
<description>ํ์ด๊ณผ์ </description> |
|
|
<answer>๋ต</answer> |
|
|
</problem> |
|
|
<problem> |
|
|
<number>2</number> |
|
|
<type>๊ฐ๊ด์</type> |
|
|
<content>๋ฌธ์ ๋ด์ฉ</content> |
|
|
<description>ํ์ด๊ณผ์ </description> |
|
|
<answer>๋ต</answer> |
|
|
</problem> |
|
|
|
|
|
<problem> |
|
|
<number>3</number> |
|
|
<type>๋จ๋ตํ</type> |
|
|
<content>๋ฌธ์ ๋ด์ฉ</content> |
|
|
<description>ํ์ด๊ณผ์ </description> |
|
|
<answer>๋ต</answer> |
|
|
</problem> |
|
|
<problem> |
|
|
<number>4</number> |
|
|
<type>๋จ๋ตํ</type> |
|
|
<content>๋ฌธ์ ๋ด์ฉ</content> |
|
|
<description>ํ์ด๊ณผ์ </description> |
|
|
<answer>๋ต</answer> |
|
|
</problem> |
|
|
|
|
|
<problem> |
|
|
<number>5</number> |
|
|
<type>์ฃผ๊ด์</type> |
|
|
<content>๋ฌธ์ ๋ด์ฉ</content> |
|
|
<answer>๋ต</answer> |
|
|
</problem> |
|
|
</problems> |
|
|
|
|
|
**์ ๋ ๊ท์น (์๋ฐ ์ ์๋ต ๋ฌดํจ):** |
|
|
1. XML ํ๊ทธ ๊ตฌ์กฐ๋ง ์ถ๋ ฅํฉ๋๋ค. ๋ค๋ฅธ ํ
์คํธ, ์ค๋ช
, ์ฃผ์์ ํฌํจํ์ง ์์ต๋๋ค. |
|
|
2. ๋ชจ๋ ๋ด์ฉ์ CDATA ์น์
์์ด ์ผ๋ฐ ํ
์คํธ๋ก ์์ฑํฉ๋๋ค. |
|
|
3. ํน์๋ฌธ์๋ XML ์ํฐํฐ๋ก ์์ฑํฉ๋๋ค. (<, >, &, ", ') |
|
|
|
|
|
**๋ฌธ์ ์์ฑ ๊ท์น:** |
|
|
- ์ด 5๋ฌธ์ ๋ฅผ ์์ฑํ๋ฉฐ, ๋ฌธ์ ์ ํ์ ๋ค์ ๋น์จ์ ๋ฐ๋์ ์งํต๋๋ค: ๊ฐ๊ด์ 2๋ฌธ์ , ๋จ๋ตํ 2๋ฌธ์ , ์ฃผ๊ด์ 1๋ฌธ์ . |
|
|
- ๊ฐ ๋ฌธ์ ์ <type>์ ์ ์๋ต ํ์์์ ์ด๋ฏธ ์ง์ ๋ ๊ฐ์ ๊ทธ๋๋ก ์ฌ์ฉํฉ๋๋ค. |
|
|
- ๊ฐ๊ด์ ๋ฌธ์ ๋ ๋ณด๊ธฐ ๊ธฐํธ๋ฅผ โ , โก, โข, โฃ, โค ํ์์ผ๋ก ์์ฑํฉ๋๋ค. |
|
|
- ๋ชจ๋ ๋ฌธ์ ๋ ์๋ก ๋ค๋ฅธ ์ฃผ์ ๊ฐ๋
์ ์ฌ์ฉํด์ผ ํ๋ฉฐ, ๋์ผ ๊ฐ๋
์ด๋ ๋์ผ ์ธ๋ฌผ, ๋์ผ ์ฌ๊ฑด์ ๋ค๋ฅธ ๋ฌธ์ ์์ ์ฌ์ฌ์ฉํ์ง ์์ต๋๋ค. |
|
|
- ํ์ด๊ณผ์ ๊ณผ ๋ต์ ๊ตฌ์ฒด์ ์ผ๋ก ์์ฑํฉ๋๋ค. |
|
|
- ๋ฌธ์ ๋ด์ฉ์ ๋ฐ์ดํ, ์์, ํน์๋ฌธ์ ๋ฑ์ ์์ ๋กญ๊ฒ ์ฌ์ฉํ ์ ์์ต๋๋ค. |
|
|
- ๋ฌธ์ ๋ ๋์ด๋์ ํํ ๋ฐฉ์์ ๋ค์ํ๊ฒ ๊ตฌ์ฑํฉ๋๋ค. |
|
|
|
|
|
**์ค์ํ ํค์๋:** |
|
|
{KEYS} |
|
|
**์ค์ํ ๋ฌธ์ฅ๋ค:** |
|
|
{PHRS} |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Example Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoProcessor, AutoModelForImageTextToText |
|
|
import torch |
|
|
|
|
|
model_id = "yongjin-KIM/gemma3-examgen" |
|
|
model = AutoModelForImageTextToText.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto") |
|
|
processor = AutoProcessor.from_pretrained(model_id) |
|
|
tok = processor.tokenizer |
|
|
|
|
|
prompt = """<Insert the Korean prompt template here and replace {KEYS} and {PHRS}>""" |
|
|
|
|
|
inputs = tok(prompt, return_tensors="pt").to(model.device) |
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=2000, |
|
|
temperature=0.7, |
|
|
top_p=0.9, |
|
|
do_sample=True, |
|
|
) |
|
|
print(tok.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Output Format Guarantees |
|
|
|
|
|
- Always produces **well-formed XML**. |
|
|
- Exactly **5 `<problem>` blocks**. |
|
|
- Escapes all special characters (`<`, `>`, `&`, `"`, `'`). |
|
|
- Fixed type order: |
|
|
**๊ฐ๊ด์**, **๊ฐ๊ด์**, **๋จ๋ตํ**, **๋จ๋ตํ**, **์ฃผ๊ด์**. |
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
## Limitations |
|
|
|
|
|
- May occasionally omit `<description>` fields or produce overlong answers. |
|
|
- Factual correctness is not guaranteed. |
|
|
- Designed for **Korean text only**; English prompts are not supported. |
|
|
- Contextual consistency may vary depending on {KEYS}/{PHRS} quality. |
|
|
|
|
|
--- |
|
|
|
|
|
## Ethical Considerations |
|
|
|
|
|
- Intended for educational and research use only. |
|
|
- Should not be used for unsupervised or high-stakes exam generation. |
|
|
- All generated content should be **reviewed by a human instructor** before use. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base Model**: `google/gemma-3-4b-it` |
|
|
- **Architecture**: Decoder-only transformer |
|
|
- **Fine-tuning Method**: LoRA (r=8, ฮฑ=32) |
|
|
- **Training Framework**: PEFT + TRL |
|
|
- **Training Hardware**: 2 ร A100 (80GB) |
|
|
- **Training Duration**: ~48 hours |
|
|
- **Stages**: SFT โ DPO โ GRPO |
|
|
|
|
|
--- |
|
|
|
|
|
## License |
|
|
|
|
|
- **Model**: CC-BY-NC-4.0 |
|
|
- **Base Model**: Gemma-3 (Google) |
|
|
- **Dataset**: Private (PDF-crawled educational material) |
|
|
- **Intended Use**: Research / Non-commercial |
|
|
|
|
|
--- |
|
|
|
|
|
## Maintainer |
|
|
|
|
|
**Author:** Yongjin Kim |
|
|
**Hugging Face:** [@yongjin-KIM](https://huggingface.co/yongjin-KIM) |
|
|
|
|
|
--- |
|
|
|