Update README.md
Browse files
README.md
CHANGED
@@ -15,4 +15,86 @@ tags:
|
|
15 |
- tourism
|
16 |
- sentiment
|
17 |
- multilingual
|
18 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
- tourism
|
16 |
- sentiment
|
17 |
- multilingual
|
18 |
+
---
|
19 |
+
|
20 |
+
|
21 |
+
Below is the revised README in Markdown format that incorporates the thesis details:
|
22 |
+
|
23 |
+
---
|
24 |
+
|
25 |
+
# distilbertTourism-multilingual-sentiment
|
26 |
+
|
27 |
+
A fine-tuned DistilBERT model for performing sentiment analysis on tourism-related texts in multiple languages. This model is a key component of the thesis project **"Enhancing Tourist Destination Management through a Multilingual Web-Based Tourist Survey System with Machine Learning."** It is designed to analyze reviews, feedback, and other textual data to improve tourist feedback collection in Panglao.
|
28 |
+
|
29 |
+
## Overview
|
30 |
+
|
31 |
+
This model builds on the [distilbert-base-multilingual-cased](https://huggingface.co/distilbert/distilbert-base-multilingual-cased) architecture and has been fine-tuned on tourism-specific sentiment data. With support for eight languages, it provides a practical solution for multilingual sentiment classification in the tourism sector.
|
32 |
+
|
33 |
+
> **Thesis Context:**
|
34 |
+
> As part of the thesis project, this model integrates with a comprehensive system that leverages advanced natural language processing techniques. In addition to this DistilBERT-based sentiment analyzer, the system utilizes BERTopic for topic modeling. The project aims to surpass the 70% accuracy benchmark set by the IPCR while addressing language barriers and inefficiencies inherent in traditional survey methods.
|
35 |
+
|
36 |
+
## Model Details
|
37 |
+
|
38 |
+
- **Task:** Text Classification (Sentiment Analysis)
|
39 |
+
- **Base Model:** [distilbert-base-multilingual-cased](https://huggingface.co/distilbert/distilbert-base-multilingual-cased)
|
40 |
+
- **Architecture:** DistilBERT
|
41 |
+
- **Parameters:** 135M
|
42 |
+
- **Tensor Format:** F32 (Safetensors)
|
43 |
+
- **Supported Languages:** 8 (Multilingual)
|
44 |
+
- **Training Data:** 160k synthetic tourism reviews
|
45 |
+
- **Performance:** Achieves over 95% confidence in sentiment classification for tourism-related texts.
|
46 |
+
- **Fine-tuning:** Adapted to the tourism domain (242 fine-tuning iterations/steps indicated)
|
47 |
+
|
48 |
+
## Usage
|
49 |
+
|
50 |
+
To integrate this model into your application, you can use the Hugging Face Transformers library. Below is an example in Python:
|
51 |
+
|
52 |
+
```python
|
53 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
54 |
+
|
55 |
+
# Define the model repository
|
56 |
+
model_name = "SCANSKY/distilbertTourism-multilingual-sentiment"
|
57 |
+
|
58 |
+
# Load tokenizer and model
|
59 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
60 |
+
model = AutoModelForSequenceClassification.from_pretrained(model_name)
|
61 |
+
|
62 |
+
# Example input text (replace with your own tourism-related text)
|
63 |
+
text = "I had an amazing experience during my trip!"
|
64 |
+
inputs = tokenizer(text, return_tensors="pt")
|
65 |
+
|
66 |
+
# Perform inference
|
67 |
+
outputs = model(**inputs)
|
68 |
+
logits = outputs.logits
|
69 |
+
|
70 |
+
# You can further process the logits to get predicted sentiment labels.
|
71 |
+
```
|
72 |
+
|
73 |
+
### Installation
|
74 |
+
|
75 |
+
Ensure you have the required packages installed:
|
76 |
+
|
77 |
+
```bash
|
78 |
+
pip install transformers safetensors
|
79 |
+
```
|
80 |
+
|
81 |
+
## Limitations
|
82 |
+
|
83 |
+
- **Domain Specific:** This model is fine-tuned specifically for tourism sentiment analysis and may not perform optimally on texts from other domains.
|
84 |
+
- **Inference API:** Currently, the model does not support direct deployment to the Hugging Face Inference API since it lacks a library tag.
|
85 |
+
|
86 |
+
## Future Work
|
87 |
+
|
88 |
+
- **Dataset Expansion:** Incorporating additional data from more tourism sources could further improve performance.
|
89 |
+
- **Model Optimization:** Experimentation with different fine-tuning strategies or hyperparameters might yield even better sentiment classification accuracy.
|
90 |
+
- **API Integration:** Future updates may include support for direct inference API deployment.
|
91 |
+
|
92 |
+
## Acknowledgements
|
93 |
+
|
94 |
+
- This model is based on the robust [DistilBERT](https://huggingface.co/distilbert/distilbert-base-multilingual-cased) architecture.
|
95 |
+
- Special thanks to the Hugging Face community for providing the infrastructure that makes deploying and sharing models seamless.
|
96 |
+
- This work is part of the thesis project **"Enhancing Tourist Destination Management through a Multilingual Web-Based Tourist Survey System with Machine Learning."** The project also utilizes BERTopic for topic modeling, aiming to revolutionize the collection and analysis of tourist feedback by overcoming language barriers and improving upon traditional survey methods.
|
97 |
+
|
98 |
+
---
|
99 |
+
|
100 |
+
Feel free to adjust or expand upon this README as your thesis project evolves!
|