ko-gemma-2-9b-it-restoration

Model Description

Model Name: ko-gemma-2-9b-it-restoration
Base Model: rtzr/ko-gemma-2-9b-it
Training Method: LoRA (r=16, alpha=32)
Purpose: Restoring obfuscated Korean reviews to natural Korean language

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("pjj11005/ko-gemma-2-9b-it-restoration")
tokenizer = AutoTokenizer.from_pretrained("pjj11005/ko-gemma-2-9b-it-restoration")

# Example usage
input_text = "obfuscated review text"
prompt = f"""user
Your task is to transform the given obfuscated Korean review into a clear, correct, and natural-sounding Korean review that reflects its original meaning.
Input: {input_text}

model
"""

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512)
result = tokenizer.decode(outputs[0], skip_special_tokens=False)
print(result)

Training Data

Dataset consisting of pairs of obfuscated Korean reviews and their original versions
https://dacon.io/competitions/official/236446/data

Training Procedure

Training Epochs: 3
Batch Size: 2 (gradient accumulation: 16)
Learning Rate: 2e-4
Maximum Sequence Length: 512
Optimizer: paged_adamw_32bit

Performance Evaluation

Evaluation Metrics: https://dacon.io/competitions/official/236446/overview/rules
Results: 29/291(Top 10%)

Limitations and Considerations

This model is specialized for Korean review restoration and may not perform well on other types of text
Highly obfuscated input text may be difficult to restore accurately

pjj11005
/

ko-gemma-2-9b-it-restoration

ko-gemma-2-9b-it-restoration

Model Description

Usage

Training Data

Training Procedure

Performance Evaluation

Limitations and Considerations

Model tree for pjj11005/ko-gemma-2-9b-it-restoration