Raj411 commited on
Commit
f1379bc
·
verified ·
1 Parent(s): 13c7541

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -0
README.md ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ tags:
4
+ - sentiment-classification
5
+ - telugu
6
+ - bert
7
+ - l3cube
8
+ - baseline
9
+ - tesent
10
+ language: te
11
+ datasets:
12
+ - TeSent_Benchmark-Dataset
13
+ model_name: Te-BERT_WR
14
+ ---
15
+
16
+ # Te-BERT_WR: Telugu-BERT Sentiment Classification Model (Without Rationale)
17
+
18
+ ## Model Overview
19
+
20
+ **Te-BERT_WR** is a Telugu sentiment classification model based on **Telugu-BERT (L3Cube-Telugu-BERT)**, a transformer-based BERT model pre-trained specifically on Telugu text (OSCAR, Wikipedia, news) by the L3Cube Pune research group for the Masked Language Modeling (MLM) task.
21
+ The "WR" in the model name stands for "Without Rationale", meaning this model is trained solely with sentiment labels and **does not use human-annotated rationales** from the TeSent_Benchmark-Dataset.
22
+
23
+ ---
24
+
25
+ ## Model Details
26
+
27
+ - **Architecture:** L3Cube-Telugu-BERT (BERT-base, pre-trained on Telugu)
28
+ - **Pretraining Data:** Telugu OSCAR, Wikipedia, and news articles
29
+ - **Fine-tuning Data:** [TeSent_Benchmark-Dataset](https://github.com/DSL-13-SRMAP/TeSent_Benchmark-Dataset), using only sentence-level sentiment labels (positive, negative, neutral); rationale annotations are disregarded
30
+ - **Task:** Sentence-level sentiment classification (3-way)
31
+ - **Rationale Usage:** Not used during training or inference
32
+
33
+ ---
34
+
35
+ ## Intended Use
36
+
37
+ - **Primary Use:** Benchmarking Telugu sentiment classification on the TeSent_Benchmark-Dataset, especially as a **baseline** for models trained with and without rationales
38
+ - **Research Setting:** Ideal for researchers working on pure Telugu text analysis where sufficient labeled data exists for fine-tuning
39
+ - **Academic Utility:** Especially suitable for low-resource and explainable NLP research in Telugu
40
+
41
+ ---
42
+
43
+ ## Why Telugu-BERT?
44
+
45
+ Telugu-BERT is tailored for Telugu and excels in capturing the vocabulary, syntax, and semantics of the language. It recognizes nuanced expressions, idioms, and sentiments that are often poorly represented in multilingual models like mBERT and XLM-R.
46
+ This makes Te-BERT_WR an excellent choice for sentiment analysis tasks and other Telugu NLP applications requiring strong language-specific representation.
47
+
48
+ ---
49
+
50
+ ## Performance and Limitations
51
+
52
+ **Strengths:**
53
+ - Superior understanding of Telugu language specifics compared to multilingual models
54
+ - Capable of capturing nuanced and idiomatic expressions in sentiment analysis
55
+ - Robust baseline for Telugu sentiment classification tasks
56
+
57
+ **Limitations:**
58
+ - Applicability limited to Telugu; not suitable for multilingual or cross-lingual tasks
59
+ - Requires sufficient labeled Telugu data for best performance
60
+ - Since rationales are not used, the model cannot provide explicit explanations for its predictions
61
+
62
+ ---
63
+
64
+ ## Training Data
65
+
66
+ - **Dataset:** [TeSent_Benchmark-Dataset](https://github.com/DSL-13-SRMAP/TeSent_Benchmark-Dataset)
67
+ - **Data Used:** Only the **Content** (Telugu sentence) and **Label** (sentiment label) columns; **rationale** annotations are ignored for Te-BERT_WR training
68
+
69
+ ---
70
+
71
+ ## Language Coverage
72
+
73
+ - **Language:** Telugu (`te`)
74
+ - **Model Scope:** This implementation and evaluation focus strictly on Telugu sentiment classification
75
+
76
+ ---
77
+
78
+ ## Citation and More Details
79
+
80
+ For detailed experimental setup, evaluation metrics, and comparisons with rationale-based models, **please refer to our paper**.
81
+
82
+
83
+
84
+ ---
85
+
86
+ ## License
87
+
88
+ Released under [CC BY 4.0](LICENSE).