SCANSKY commited on
Commit
3b5ac4b
·
verified ·
1 Parent(s): 10789df

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +83 -1
README.md CHANGED
@@ -15,4 +15,86 @@ tags:
15
  - tourism
16
  - sentiment
17
  - multilingual
18
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  - tourism
16
  - sentiment
17
  - multilingual
18
+ ---
19
+
20
+
21
+ Below is the revised README in Markdown format that incorporates the thesis details:
22
+
23
+ ---
24
+
25
+ # distilbertTourism-multilingual-sentiment
26
+
27
+ A fine-tuned DistilBERT model for performing sentiment analysis on tourism-related texts in multiple languages. This model is a key component of the thesis project **"Enhancing Tourist Destination Management through a Multilingual Web-Based Tourist Survey System with Machine Learning."** It is designed to analyze reviews, feedback, and other textual data to improve tourist feedback collection in Panglao.
28
+
29
+ ## Overview
30
+
31
+ This model builds on the [distilbert-base-multilingual-cased](https://huggingface.co/distilbert/distilbert-base-multilingual-cased) architecture and has been fine-tuned on tourism-specific sentiment data. With support for eight languages, it provides a practical solution for multilingual sentiment classification in the tourism sector.
32
+
33
+ > **Thesis Context:**
34
+ > As part of the thesis project, this model integrates with a comprehensive system that leverages advanced natural language processing techniques. In addition to this DistilBERT-based sentiment analyzer, the system utilizes BERTopic for topic modeling. The project aims to surpass the 70% accuracy benchmark set by the IPCR while addressing language barriers and inefficiencies inherent in traditional survey methods.
35
+
36
+ ## Model Details
37
+
38
+ - **Task:** Text Classification (Sentiment Analysis)
39
+ - **Base Model:** [distilbert-base-multilingual-cased](https://huggingface.co/distilbert/distilbert-base-multilingual-cased)
40
+ - **Architecture:** DistilBERT
41
+ - **Parameters:** 135M
42
+ - **Tensor Format:** F32 (Safetensors)
43
+ - **Supported Languages:** 8 (Multilingual)
44
+ - **Training Data:** 160k synthetic tourism reviews
45
+ - **Performance:** Achieves over 95% confidence in sentiment classification for tourism-related texts.
46
+ - **Fine-tuning:** Adapted to the tourism domain (242 fine-tuning iterations/steps indicated)
47
+
48
+ ## Usage
49
+
50
+ To integrate this model into your application, you can use the Hugging Face Transformers library. Below is an example in Python:
51
+
52
+ ```python
53
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
54
+
55
+ # Define the model repository
56
+ model_name = "SCANSKY/distilbertTourism-multilingual-sentiment"
57
+
58
+ # Load tokenizer and model
59
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
60
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
61
+
62
+ # Example input text (replace with your own tourism-related text)
63
+ text = "I had an amazing experience during my trip!"
64
+ inputs = tokenizer(text, return_tensors="pt")
65
+
66
+ # Perform inference
67
+ outputs = model(**inputs)
68
+ logits = outputs.logits
69
+
70
+ # You can further process the logits to get predicted sentiment labels.
71
+ ```
72
+
73
+ ### Installation
74
+
75
+ Ensure you have the required packages installed:
76
+
77
+ ```bash
78
+ pip install transformers safetensors
79
+ ```
80
+
81
+ ## Limitations
82
+
83
+ - **Domain Specific:** This model is fine-tuned specifically for tourism sentiment analysis and may not perform optimally on texts from other domains.
84
+ - **Inference API:** Currently, the model does not support direct deployment to the Hugging Face Inference API since it lacks a library tag.
85
+
86
+ ## Future Work
87
+
88
+ - **Dataset Expansion:** Incorporating additional data from more tourism sources could further improve performance.
89
+ - **Model Optimization:** Experimentation with different fine-tuning strategies or hyperparameters might yield even better sentiment classification accuracy.
90
+ - **API Integration:** Future updates may include support for direct inference API deployment.
91
+
92
+ ## Acknowledgements
93
+
94
+ - This model is based on the robust [DistilBERT](https://huggingface.co/distilbert/distilbert-base-multilingual-cased) architecture.
95
+ - Special thanks to the Hugging Face community for providing the infrastructure that makes deploying and sharing models seamless.
96
+ - This work is part of the thesis project **"Enhancing Tourist Destination Management through a Multilingual Web-Based Tourist Survey System with Machine Learning."** The project also utilizes BERTopic for topic modeling, aiming to revolutionize the collection and analysis of tourist feedback by overcoming language barriers and improving upon traditional survey methods.
97
+
98
+ ---
99
+
100
+ Feel free to adjust or expand upon this README as your thesis project evolves!