Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,44 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
tags:
|
3 |
+
- sentence-similarity
|
4 |
+
---
|
5 |
+
|
6 |
+
# Azerbaijani Sentence Similarity Based on BERT - Model Description
|
7 |
+
|
8 |
+
This model is developed by Alas Development Center and is tailored for the specific use case of sentence similarity in the Azerbaijani language. It employs the bert-base-multilingual-cased architecture, fine-tuned on a custom Azerbaijani sentence similarity dataset. The primary function of this model is to predict the similarity score between two sentences, which can be highly beneficial in various NLP applications such as information retrieval, question answering, and content analysis.
|
9 |
+
|
10 |
+
# Motivation
|
11 |
+
|
12 |
+
The core motivation behind developing this model is to address the challenge of Semantic Similarity in the Azerbaijani language. Semantic Similarity assesses how close two sentences are in terms of their underlying meanings. This concept is crucial in many fields, including but not limited to natural language processing, linguistics, and artificial intelligence, facilitating a deeper understanding and processing of human languages.
|
13 |
+
|
14 |
+
# Model Training and Evaluation Data
|
15 |
+
|
16 |
+
|
17 |
+
The dataset used for fine-tuning the bert-base-multilingual-cased model specifically targets sentence similarity in Azerbaijani. However, due to privacy concerns, we are not planning to share the dataset. Below are some details about the training and evaluation data:
|
18 |
+
|
19 |
+
Total Training Samples: 77,499
|
20 |
+
Total Validation Samples: 5,500
|
21 |
+
Total Test Samples: 7,500
|
22 |
+
|
23 |
+
|
24 |
+
The dataset categorizes sentence pairs into two distinct classes based on their similarity:
|
25 |
+
|
26 |
+
Contradiction: The sentences share no similarity.
|
27 |
+
Entailment: The sentences have a similar or nearly identical meaning.
|
28 |
+
Neutral: The sentences are neutral.
|
29 |
+
|
30 |
+
# Use and Access
|
31 |
+
|
32 |
+
This model is shared open source and is intended for wide usage across different applications where understanding sentence similarity in Azerbaijani is crucial. It can be especially useful for developers and researchers working on Azerbaijani language processing tasks.
|
33 |
+
Acknowledgements
|
34 |
+
|
35 |
+
We express our gratitude to our team who participated in the development, training, and evaluation phases of this model. Their dedication and hard work have been instrumental in advancing Azerbaijani language processing technologies.
|
36 |
+
|
37 |
+
# Disclaimer
|
38 |
+
|
39 |
+
This model is provided "as is," without any warranty or guarantee of its accuracy, completeness, or suitability for any particular purpose. Users should exercise their judgment and discretion in employing this model for their applications.
|
40 |
+
|
41 |
+
## Model Plot
|
42 |
+
|
43 |
+

|
44 |
+
|