πŸ“Œ λͺ¨λΈ 상세 정보

1. 🧾 κ°œμš”

이 λͺ¨λΈμ€ ν•œκ΅­μ–΄ λ¬Έμž₯ λ‚΄ μœ ν•΄ ν‘œν˜„μ˜ 유무λ₯Ό κ²€μΆœν•˜κΈ° μœ„ν•΄ ν•™μŠ΅λœ λͺ¨λΈμž…λ‹ˆλ‹€.
binary classification을 μˆ˜ν–‰ν•˜λ©°, μœ ν•΄ ν‘œν˜„μ΄ ν¬ν•¨λ˜μ—ˆκ±°λ‚˜ 일반적인 λ¬Έμž₯인지λ₯Ό **νŒλ‹¨(λΆ„λ₯˜)**ν•©λ‹ˆλ‹€.
AI-Taskλ‘œλŠ” text-classification에 ν•΄λ‹Ήν•©λ‹ˆλ‹€.
μ‚¬μš©ν•˜λŠ” 데이터셋은 TTA-DQA/hate_sentenceμž…λ‹ˆλ‹€.

  • 클래슀 ꡬ성:
    • "0": no_hate
    • "1": hate

2. 🧠 ν•™μŠ΅ 정보

  • Base Model: KcElectra (a pre-trained Korean language model based on Electra)
  • Source: monologg/koelectra-base-v3-discriminator
  • Model Type: Casual Language Model
  • Pre-training (Korean): μ•½ 20GB
  • Fine-tuning (Hate Dataset): μ•½ 22.3MB (TTA-DQA/hate_sentence)
  • Learning Rate: 5e-6
  • Weight Decay: 0.01
  • Epochs: 20
  • Batch Size: 16
  • Data Loader Workers: 2
  • Tokenizer: BertWordPieceTokenizer
  • Model Size: μ•½ 512MB

3. 🧩 μš”κ΅¬μ‚¬ν•­

  • pytorch ~= 1.8.0
  • transformers ~= 4.0.0

4. πŸš€ Quick Start

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

model_name = "TTA-DQA/HateDetection_KoElectra_FineTuning"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

sentences = ["였늘 점심 뭐 λ¨Ήμ„κΉŒ?", "이 λ‚˜μœ λ†ˆμ•„."]
results = classifier(sentences)'

5.πŸ“š Citation

이 λͺ¨λΈμ€ μ΄ˆκ±°λŒ€AI ν•™μŠ΅μš© 데이터 ν’ˆμ§ˆκ²€μ¦ 사업(2024년도 μ΄ˆκ±°λŒ€AI ν•™μŠ΅μš© ν’ˆμ§ˆκ²€μ¦)에 μ˜ν•΄μ„œ κ΅¬μΆ•λ˜μ—ˆμŠ΅λ‹ˆλ‹€.


6. ⚠️ Bias, Risks, and Limitations

λ³Έ λͺ¨λΈμ€ 각 클래슀의 데이터λ₯Ό 편ν–₯되게 ν•™μŠ΅ν•˜μ§€λŠ” μ•Šμ•˜μœΌλ‚˜,
언어적·문화적 νŠΉμ„±μ— μ˜ν•΄ λ ˆμ΄λΈ”μ— λŒ€ν•œ 이견이 μžˆμ„ 수 μžˆμŠ΅λ‹ˆλ‹€.
μœ ν•΄ ν‘œν˜„μ€ μ–Έμ–΄, λ¬Έν™”, 적용 λΆ„μ•Ό, 개인적 견해에 따라 주관적인 뢀뢄이 μ‘΄μž¬ν•˜μ—¬,
결과에 λŒ€ν•œ 편ν–₯ λ˜λŠ” λ…Όλž€μ΄ λ°œμƒν•  수 μžˆμŠ΅λ‹ˆλ‹€.

❗ λ³Έ λͺ¨λΈμ˜ κ²°κ³ΌλŠ” μ ˆλŒ€μ μΈ μœ ν•΄ ν‘œν˜„ 기쀀이 μ•„λ‹˜μ„ μœ μ˜ν•΄ μ£Όμ„Έμš”.


πŸ“ˆ Results

  • Task: binary classification (text-classification)
  • F1-score: 0.9881
  • Accuracy: 0.9881
Downloads last month
6
Safetensors
Model size
113M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for TTA-DQA/HateDetection_KoElectra_FineTuning

Finetuned
(85)
this model