π λͺ¨λΈ μμΈ μ 보
1. π§Ύ κ°μ
μ΄ λͺ¨λΈμ νκ΅μ΄ λ¬Έμ₯ λ΄ μ ν΄ ννμ μ 무λ₯Ό κ²μΆνκΈ° μν΄ νμ΅λ λͺ¨λΈμ
λλ€.binary classification
μ μννλ©°, μ ν΄ ννμ΄ ν¬ν¨λμκ±°λ μΌλ°μ μΈ λ¬Έμ₯μΈμ§λ₯Ό **νλ¨(λΆλ₯)**ν©λλ€.
AI-Taskλ‘λ text-classification
μ ν΄λΉν©λλ€.
μ¬μ©νλ λ°μ΄ν°μ
μ TTA-DQA/hate_sentence
μ
λλ€.
- ν΄λμ€ κ΅¬μ±:
"0"
:no_hate
"1"
:hate
2. π§ νμ΅ μ 보
- Base Model: KcElectra (a pre-trained Korean language model based on Electra)
- Source: monologg/koelectra-base-v3-discriminator
- Model Type: Casual Language Model
- Pre-training (Korean): μ½ 20GB
- Fine-tuning (Hate Dataset): μ½ 22.3MB (
TTA-DQA/hate_sentence
) - Learning Rate:
5e-6
- Weight Decay:
0.01
- Epochs:
20
- Batch Size:
16
- Data Loader Workers:
2
- Tokenizer:
BertWordPieceTokenizer
- Model Size: μ½
512MB
3. π§© μꡬμ¬ν
pytorch ~= 1.8.0
transformers ~= 4.0.0
4. π Quick Start
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
model_name = "TTA-DQA/HateDetection_KoElectra_FineTuning"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
sentences = ["μ€λ μ μ¬ λ λ¨ΉμκΉ?", "μ΄ λμ λμ."]
results = classifier(sentences)'
5.π Citation
μ΄ λͺ¨λΈμ μ΄κ±°λAI νμ΅μ© λ°μ΄ν° νμ§κ²μ¦ μ¬μ (2024λ λ μ΄κ±°λAI νμ΅μ© νμ§κ²μ¦)μ μν΄μ ꡬμΆλμμ΅λλ€.
6. β οΈ Bias, Risks, and Limitations
λ³Έ λͺ¨λΈμ κ° ν΄λμ€μ λ°μ΄ν°λ₯Ό νΈν₯λκ² νμ΅νμ§λ μμμΌλ,
μΈμ΄μ Β·λ¬Ένμ νΉμ±μ μν΄ λ μ΄λΈμ λν μ΄κ²¬μ΄ μμ μ μμ΅λλ€.
μ ν΄ ννμ μΈμ΄, λ¬Έν, μ μ© λΆμΌ, κ°μΈμ 견ν΄μ λ°λΌ μ£Όκ΄μ μΈ λΆλΆμ΄ μ‘΄μ¬νμ¬,
κ²°κ³Όμ λν νΈν₯ λλ λ
Όλμ΄ λ°μν μ μμ΅λλ€.
β λ³Έ λͺ¨λΈμ κ²°κ³Όλ μ λμ μΈ μ ν΄ νν κΈ°μ€μ΄ μλμ μ μν΄ μ£ΌμΈμ.
π Results
- Task: binary classification (text-classification)
- F1-score: 0.9881
- Accuracy: 0.9881
- Downloads last month
- 6
Model tree for TTA-DQA/HateDetection_KoElectra_FineTuning
Base model
monologg/koelectra-base-v3-discriminator