File size: 2,924 Bytes
378955a
 
3df812e
378955a
 
3df812e
378955a
3df812e
378955a
 
 
 
 
3df812e
 
 
 
 
 
378955a
3df812e
378955a
3df812e
378955a
 
3df812e
378955a
 
 
 
 
3df812e
 
 
 
 
378955a
a27f832
 
 
 
 
 
 
 
 
 
 
378955a
 
3df812e
 
378955a
 
 
3df812e
 
378955a
 
 
3df812e
378955a
 
 
3df812e
 
 
 
378955a
3df812e
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---
library_name: transformers
tags: [sentence-classification, korean, finance, multi-class, ko-sroberta, transformers]
---

# Model Card for Sentence Type Classification (Financial Texts)

This model is fine-tuned to classify Korean financial sentences into four categories: Predictive, Inferential, Factual, and Conversational. It is built upon `jhgan/ko-sroberta-multitask`, a multilingual transformer model specialized for Korean NLP tasks.

## Model Details

### Model Description

- **Developed by:** Kwon Cho
- **Shared by:** kwoncho
- **Model type:** RoBERTa-based transformer (fine-tuned for sequence classification)
- **Language(s):** Korean (ํ•œ๊ตญ์–ด)
- **License:** Apache 2.0 (from base model)
- **Finetuned from model:** [`jhgan/ko-sroberta-multitask`](https://huggingface.co/jhgan/ko-sroberta-multitask)

This model was fine-tuned for multi-class classification using supervised learning with Hugging Face Transformers and PyTorch.

### Model Sources

- **Repository:** [More Information Needed]
- **Demo:** [More Information Needed]

## Uses

### Direct Use

The model can be used to classify financial sentences (in Korean) into one of the following categories:
- **Predictive** (์˜ˆ์ธกํ˜•)
- **Inferential** (์ถ”๋ก ํ˜•)
- **Factual** (์‚ฌ์‹คํ˜•)
- **Conversational** (๋Œ€ํ™”ํ˜•)

### Training Data

- **Dataset Name:** ๋ฌธ์žฅ ์œ ํ˜•(์ถ”๋ก , ์˜ˆ์ธก ๋“ฑ) ํŒ๋‹จ ๋ฐ์ดํ„ฐ  
- **์ถœ์ฒ˜:** [AIHub ๋งํฌ](https://www.aihub.or.kr/aihubdata/data/view.do?pageIndex=1&currMenu=115&topMenu=100&srchOptnCnd=OPTNCND001&searchKeyword=์˜ˆ์ธกํ˜•&srchDetailCnd=DETAILCND001&srchOrder=ORDER001&srchPagePer=20&srchDataRealmCode=REALM002&aihubDataSe=data&dataSetSn=71486)

์ด ๋ฐ์ดํ„ฐ๋Š” ํ•œ๊ตญ์–ด ๊ธˆ์œต ๋ฌธ์žฅ์„ ๋‹ค์Œ ๋„ค ๊ฐ€์ง€ ์œ ํ˜•์œผ๋กœ ๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค:
- `์˜ˆ์ธกํ˜• (Predictive)`
- `์ถ”๋ก ํ˜• (Inferential)`
- `์‚ฌ์‹คํ˜• (Factual)`
- `๋Œ€ํ™”ํ˜• (Conversational)`

### Out-of-Scope Use

- Not suitable for general-purpose Korean sentence classification outside financial or economic contexts.
- May not perform well on informal or highly colloquial text.

## Bias, Risks, and Limitations

- The model may carry biases present in the training dataset.
- Misclassifications could have downstream implications if used for investment recommendations or financial analysis without verification.

### Recommendations

Use this model in conjunction with human oversight, especially for high-stakes or production-level applications.

## How to Get Started with the Model

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("kwoncho/sentence_type_classification")
model = AutoModelForSequenceClassification.from_pretrained("kwoncho/sentence_type_classification")

text = "ํ•ด๋‹น ์ข…๋ชฉ์€ ๋‹จ๊ธฐ์ ์œผ๋กœ ํ•˜๋ฝํ•  ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค."
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)