Organization Card

OPJ-Corpus

Group members: Ivana Kollert, Lorena Mitrović, Marija Nadoveza, Ana Sabo, Mia Sambolec.

Information about the project so far:

We have collected croatian movie reviews of the newest thriller, sci-fy and horror movies.

We have collected about 3000 sentences and organized them by movie, review and sentence ID.

We have made a pilot annotation campaign with 150 sentences. All of us were annotators in this pilot campaign. We used 0 for positive sentiment, 1 for neutral sentiment and 2 for negative sentiment. After running the code, we got the agreement rate of 0.3182 and this is a Fair agreement.

We have made final two annotation campaigns. In the first round, we made an annotation with 3 annotators and in the final round we made a single annotation for all the 3259 sentences.

We used a total of three sentiments for the two final annotations. We used 0 for positive, 1 for negative and 2 for neutral sentiment.

We have created a new code that would help us to calculate the inter-rater agreement more easily and effectively using fleiss Kappa.

The inter-rater agreement we got in the end is 0.7869 and that is a substantial agreement.

We have created a new CSV file that contains only the two colums of our corpus labeled "sentence" and "label". The name of this file is Movies - Final Annotation - test.

We have used a code to split our CSV file into two parts, a Train set and a Test set. The Train set includes 75 to 80% of the original file and the Test set includes 25 to 30% of the original CSV files. Both the Train and Test set have been saved as two separates CSV files.

We conducted a test on the Test set in which we calculated the average number of words per sentence, the largest and the smalest sentece. We also calculated the number of sentences per label in the Test set.

The Test set statistics is as follows:

- 653 total sentences

Category	Metric	Value
Label Distribution	Label 0	165
	Label 1	58
	Label 2	430
Sentence Length	Average	21.33
	Shortest	1
	Longest	95

The results are saved in the file "Dataset.md" See link below.

https://github.com/HighFive-OPJ/OPJ-Corpus/blob/9415303240584f8f93e58f7c4a190b8b235388f2/Exploratory%20data%20analysis/Dataset.md

We implemented our data with machine learning. We used SVM and KNN algorithms.

These are the results:

#	Method	Algorithm	Train	Test 1: Group 1	Test 2: Group 2	Test 3: Group 3
1.a.i	Machine learning (2 methods)	SVM	Train 1 or 2 or 3 / [respective own]	Precision: 0.6301, Recall: 0.6684, F1: 0.5532, Accuracy: 0.6684	Precision: 0.5402, Recall: 0.6099, F1: 0.4935, Accuracy: 0.6099	Precision: 0.4200, Recall: 0.6481, F1: 0.5097, Accuracy: 0.6481
1.a.ii		SVM	TRAIN	Precision: 0.6119, Recall: 0.6782, F1: 0.6182, Accuracy: 0.6782	Precision: 0.5699, Recall: 0.6222, F1: 0.5624, Accuracy: 0.6222
1.b.i		K-Nearest Neighbors (KNN)	Train 1 or 2 or 3 / [respective own]	Precision: 0.5941, Recall: 0.6684, F1: 0.5544, Accuracy: 0.6684	Precision: 0.4766, Recall: 0.5964, F1: 0.4870, Accuracy: 0.5964	Precision: 0.5098, Recall: 0.6567, F1: 0.5366, Accuracy: 0.6567
1.b.ii		KNN	TRAIN	Precision: 0.5066, Recall: 0.6398, F1: 0.5233, Accuracy: 0.6398	Precision: 0.5641, Recall: 0.6117, F1: 0.5479, Accuracy: 0.6117

The best results for SVM shows Test-1 for all categories. The best results in the TRAIN category shows Train-1

The best results for KNN shows Test-1, but Test-3 shows also good results. In the TRAIN category Train-1 shows best results in Recall and Accuracy and Train-2 shows best results in Precision and the F1-Score.

spaces 1

Running

Demo

📚

Analyze... sentiment in text using different models

models 6

datasets 5

HighFive-OPJ/Implementation3

Preview • Updated 3 days ago • 13

HighFive-OPJ/Deep_Learning

Preview • Updated 4 days ago • 73

HighFive-OPJ/OPJ-Corpus

Viewer • Updated 24 days ago • 3.26k • 69

HighFive-OPJ/Exploratory_data_analysis

Viewer • Updated 24 days ago • 3.26k • 32

HighFive-OPJ/Implementation_1-Machine_learning

Updated 24 days ago • 55

HighFive-OPJ

AI & ML interests

Recent Activity

OPJ-Corpus

spaces 1

Demo

models 6

HighFive-OPJ/bert_model

HighFive-OPJ/cnn-model-sentiment

HighFive-OPJ/knn-sentiment

HighFive-OPJ/bertic_sentiment

HighFive-OPJ/lstm-sentiment-model

HighFive-OPJ/svm-sentiment-model

datasets 5

HighFive-OPJ/Implementation3

HighFive-OPJ/Deep_Learning

HighFive-OPJ/OPJ-Corpus

HighFive-OPJ/Exploratory_data_analysis

HighFive-OPJ/Implementation_1-Machine_learning

AI & ML interests

Recent Activity

Team members 5

OPJ-Corpus

spaces 1

Demo

models 6 Sort: Recently updated

datasets 5 Sort: Recently updated

models 6

datasets 5