File size: 2,924 Bytes
378955a 3df812e 378955a 3df812e 378955a 3df812e 378955a 3df812e 378955a 3df812e 378955a 3df812e 378955a 3df812e 378955a 3df812e 378955a a27f832 378955a 3df812e 378955a 3df812e 378955a 3df812e 378955a 3df812e 378955a 3df812e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
---
library_name: transformers
tags: [sentence-classification, korean, finance, multi-class, ko-sroberta, transformers]
---
# Model Card for Sentence Type Classification (Financial Texts)
This model is fine-tuned to classify Korean financial sentences into four categories: Predictive, Inferential, Factual, and Conversational. It is built upon `jhgan/ko-sroberta-multitask`, a multilingual transformer model specialized for Korean NLP tasks.
## Model Details
### Model Description
- **Developed by:** Kwon Cho
- **Shared by:** kwoncho
- **Model type:** RoBERTa-based transformer (fine-tuned for sequence classification)
- **Language(s):** Korean (ํ๊ตญ์ด)
- **License:** Apache 2.0 (from base model)
- **Finetuned from model:** [`jhgan/ko-sroberta-multitask`](https://huggingface.co/jhgan/ko-sroberta-multitask)
This model was fine-tuned for multi-class classification using supervised learning with Hugging Face Transformers and PyTorch.
### Model Sources
- **Repository:** [More Information Needed]
- **Demo:** [More Information Needed]
## Uses
### Direct Use
The model can be used to classify financial sentences (in Korean) into one of the following categories:
- **Predictive** (์์ธกํ)
- **Inferential** (์ถ๋ก ํ)
- **Factual** (์ฌ์คํ)
- **Conversational** (๋ํํ)
### Training Data
- **Dataset Name:** ๋ฌธ์ฅ ์ ํ(์ถ๋ก , ์์ธก ๋ฑ) ํ๋จ ๋ฐ์ดํฐ
- **์ถ์ฒ:** [AIHub ๋งํฌ](https://www.aihub.or.kr/aihubdata/data/view.do?pageIndex=1&currMenu=115&topMenu=100&srchOptnCnd=OPTNCND001&searchKeyword=์์ธกํ&srchDetailCnd=DETAILCND001&srchOrder=ORDER001&srchPagePer=20&srchDataRealmCode=REALM002&aihubDataSe=data&dataSetSn=71486)
์ด ๋ฐ์ดํฐ๋ ํ๊ตญ์ด ๊ธ์ต ๋ฌธ์ฅ์ ๋ค์ ๋ค ๊ฐ์ง ์ ํ์ผ๋ก ๋ถ๋ฅํฉ๋๋ค:
- `์์ธกํ (Predictive)`
- `์ถ๋ก ํ (Inferential)`
- `์ฌ์คํ (Factual)`
- `๋ํํ (Conversational)`
### Out-of-Scope Use
- Not suitable for general-purpose Korean sentence classification outside financial or economic contexts.
- May not perform well on informal or highly colloquial text.
## Bias, Risks, and Limitations
- The model may carry biases present in the training dataset.
- Misclassifications could have downstream implications if used for investment recommendations or financial analysis without verification.
### Recommendations
Use this model in conjunction with human oversight, especially for high-stakes or production-level applications.
## How to Get Started with the Model
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("kwoncho/sentence_type_classification")
model = AutoModelForSequenceClassification.from_pretrained("kwoncho/sentence_type_classification")
text = "ํด๋น ์ข
๋ชฉ์ ๋จ๊ธฐ์ ์ผ๋ก ํ๋ฝํ ๊ฐ๋ฅ์ฑ์ด ์์ต๋๋ค."
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs) |