Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,82 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
### Wav2Vec2 Speech Emotion Recognition for English
|
2 |
+
|
3 |
+
## Model Overview
|
4 |
+
This model is fine-tuned for recognizing emotions in English speech using the Wav2Vec2 architecture. It is capable of detecting the following emotions:
|
5 |
+
|
6 |
+
- Sadness
|
7 |
+
- Anger
|
8 |
+
- Disgust
|
9 |
+
- Fear
|
10 |
+
- Happiness
|
11 |
+
- Neutral
|
12 |
+
The model was trained on the Speech Emotion Recognition dataset from Kaggle, which consists of English emotional speech samples. The dataset includes audio files labeled with various emotional states, making it ideal for training models in emotion recognition tasks.
|
13 |
+
|
14 |
+
## Model Details
|
15 |
+
- Architecture: Wav2Vec2
|
16 |
+
- Languages: English
|
17 |
+
- Dataset: Speech Emotion Recognition Dataset (Kaggle)
|
18 |
+
- Emotions Detected: Sadness, Anger, Disgust, Fear, Happiness, Neutral
|
19 |
+
|
20 |
+
## How to Use
|
21 |
+
### Installation
|
22 |
+
To use this model, you need to install the transformers and torchaudio packages:
|
23 |
+
|
24 |
+
```bash
|
25 |
+
pip install transformers
|
26 |
+
pip install torchaudio
|
27 |
+
```
|
28 |
+
|
29 |
+
## Example Usage
|
30 |
+
Here is an example of how to use the model to classify emotions in an English audio file:
|
31 |
+
|
32 |
+
```bash
|
33 |
+
from transformers import pipeline
|
34 |
+
|
35 |
+
# Load the fine-tuned model and feature extractor
|
36 |
+
pipe = pipeline("audio-classification", model="Khoa/w2v-speech-emotion-recognition")
|
37 |
+
|
38 |
+
# Path to your audio file
|
39 |
+
audio_file = "path_to_your_audio_file.wav"
|
40 |
+
|
41 |
+
# Perform emotion classification
|
42 |
+
predictions = pipe(audio_file)
|
43 |
+
|
44 |
+
# Map predictions to real emotion labels
|
45 |
+
label_map = {
|
46 |
+
"LABEL_0": "sadness",
|
47 |
+
"LABEL_1": "angry",
|
48 |
+
"LABEL_2": "disgust",
|
49 |
+
"LABEL_3": "fear",
|
50 |
+
"LABEL_4": "happy",
|
51 |
+
"LABEL_5": "neutral"
|
52 |
+
}
|
53 |
+
|
54 |
+
# Convert predictions to readable labels
|
55 |
+
mapped_predictions = [
|
56 |
+
{"score": pred["score"], "label": label_map[pred["label"]]}
|
57 |
+
for pred in predictions
|
58 |
+
]
|
59 |
+
|
60 |
+
# Display results
|
61 |
+
print(mapped_predictions)
|
62 |
+
```
|
63 |
+
|
64 |
+
## Example Output
|
65 |
+
The model outputs a list of predictions with scores for each emotion. For example:
|
66 |
+
|
67 |
+
```json
|
68 |
+
[
|
69 |
+
{"score": 0.95, "label": "angry"},
|
70 |
+
{"score": 0.02, "label": "happy"},
|
71 |
+
{"score": 0.01, "label": "disgust"},
|
72 |
+
{"score": 0.01, "label": "neutral"},
|
73 |
+
{"score": 0.01, "label": "fear"}
|
74 |
+
]
|
75 |
+
````
|
76 |
+
|
77 |
+
## Training Details
|
78 |
+
The model was fine-tuned on the Speech Emotion Recognition Dataset, using the Wav2Vec2 architecture. The training process involved multiple epochs with a learning rate of 1e-5.
|
79 |
+
|
80 |
+
## Limitations and Biases
|
81 |
+
This model is specifically trained on English speech data and may not perform well on other languages or dialects. Additionally, as with any machine learning model, there may be biases present in the training data that could affect the model's predictions.
|
82 |
+
|