File size: 2,071 Bytes
348c2b0
 
5a6b148
 
 
 
 
 
 
 
348c2b0
 
 
 
5a6b148
348c2b0
 
 
 
 
 
 
 
 
4497ee2
348c2b0
5a6b148
 
 
 
 
348c2b0
1dd15c1
348c2b0
 
 
 
 
5a6b148
348c2b0
 
 
4497ee2
5a6b148
 
 
348c2b0
4b73c0e
5a6b148
 
348c2b0
5a6b148
 
348c2b0
5a6b148
 
 
 
 
 
348c2b0
5a6b148
 
348c2b0
5a6b148
 
348c2b0
5a6b148
348c2b0
5a6b148
 
348c2b0
5a6b148
 
348c2b0
5a6b148
348c2b0
5a6b148
4497ee2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
---
library_name: transformers
tags:
- speech-to-txt
- uzbek stt
- uzbek tts
license: apache-2.0
language:
- uz
pipeline_tag: automatic-speech-recognition
---

# Model Card for Model ID

This model is a fine-tuned version of oyqiz/uzbek_stt based mainly on laws and military related dataset. 



## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->



- **Developed by:** Sara Musaeva
- **Funded by:** SSD
- **Model type:** Transformers
- **Language(s) (NLP):** Uzbek
- **Finetuned from model:** Oyqiz/uzbek-stt

### Model Sources 
<!-- Provide the basic links for the model. -->

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
Intended for Speech-to-text conversion


## How to Get Started with the Model
```python
from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
import torch
import torchaudio

model_name = "sarahai/uzbek-stt-3"
model = Wav2Vec2ForCTC.from_pretrained(model_name)
processor = Wav2Vec2Processor.from_pretrained(model_name)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

def load_and_preprocess_audio(file_path):
    speech_array, sampling_rate = torchaudio.load(file_path)
    if sampling_rate != 16000:
        resampler = torchaudio.transforms.Resample(orig_freq=sampling_rate, new_freq=16000)
        speech_array = resampler(speech_array)
    return speech_array.squeeze().numpy()

def replace_unk(transcription):
    return transcription.replace("[UNK]", "ʼ")

audio_file = "/content/audio_2024-08-13_15-20-53.ogg"
speech_array = load_and_preprocess_audio(audio_file)

input_values = processor(speech_array, sampling_rate=16000, return_tensors="pt").input_values.to(device)

with torch.no_grad():
    logits = model(input_values).logits

predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)

transcription_text = replace_unk(transcription[0])

print("Transcription:", transcription_text)
```