ko-gemma-2-9b-it-restoration
Model Description
- Model Name: ko-gemma-2-9b-it-restoration
- Base Model: rtzr/ko-gemma-2-9b-it
- Training Method: LoRA (r=16, alpha=32)
- Purpose: Restoring obfuscated Korean reviews to natural Korean language
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("pjj11005/ko-gemma-2-9b-it-restoration")
tokenizer = AutoTokenizer.from_pretrained("pjj11005/ko-gemma-2-9b-it-restoration")
# Example usage
input_text = "obfuscated review text"
prompt = f"""user
Your task is to transform the given obfuscated Korean review into a clear, correct, and natural-sounding Korean review that reflects its original meaning.
Input: {input_text}
model
"""
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512)
result = tokenizer.decode(outputs[0], skip_special_tokens=False)
print(result)
Training Data
- Dataset consisting of pairs of obfuscated Korean reviews and their original versions
- https://dacon.io/competitions/official/236446/data
Training Procedure
- Training Epochs: 3
- Batch Size: 2 (gradient accumulation: 16)
- Learning Rate: 2e-4
- Maximum Sequence Length: 512
- Optimizer: paged_adamw_32bit
Performance Evaluation
- Evaluation Metrics: https://dacon.io/competitions/official/236446/overview/rules
- Results: 29/291(Top 10%)
Limitations and Considerations
- This model is specialized for Korean review restoration and may not perform well on other types of text
- Highly obfuscated input text may be difficult to restore accurately
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support