File size: 2,071 Bytes
348c2b0 5a6b148 348c2b0 5a6b148 348c2b0 4497ee2 348c2b0 5a6b148 348c2b0 1dd15c1 348c2b0 5a6b148 348c2b0 4497ee2 5a6b148 348c2b0 4b73c0e 5a6b148 348c2b0 5a6b148 348c2b0 5a6b148 348c2b0 5a6b148 348c2b0 5a6b148 348c2b0 5a6b148 348c2b0 5a6b148 348c2b0 5a6b148 348c2b0 5a6b148 348c2b0 5a6b148 4497ee2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
---
library_name: transformers
tags:
- speech-to-txt
- uzbek stt
- uzbek tts
license: apache-2.0
language:
- uz
pipeline_tag: automatic-speech-recognition
---
# Model Card for Model ID
This model is a fine-tuned version of oyqiz/uzbek_stt based mainly on laws and military related dataset.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** Sara Musaeva
- **Funded by:** SSD
- **Model type:** Transformers
- **Language(s) (NLP):** Uzbek
- **Finetuned from model:** Oyqiz/uzbek-stt
### Model Sources
<!-- Provide the basic links for the model. -->
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
Intended for Speech-to-text conversion
## How to Get Started with the Model
```python
from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
import torch
import torchaudio
model_name = "sarahai/uzbek-stt-3"
model = Wav2Vec2ForCTC.from_pretrained(model_name)
processor = Wav2Vec2Processor.from_pretrained(model_name)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
def load_and_preprocess_audio(file_path):
speech_array, sampling_rate = torchaudio.load(file_path)
if sampling_rate != 16000:
resampler = torchaudio.transforms.Resample(orig_freq=sampling_rate, new_freq=16000)
speech_array = resampler(speech_array)
return speech_array.squeeze().numpy()
def replace_unk(transcription):
return transcription.replace("[UNK]", "ʼ")
audio_file = "/content/audio_2024-08-13_15-20-53.ogg"
speech_array = load_and_preprocess_audio(audio_file)
input_values = processor(speech_array, sampling_rate=16000, return_tensors="pt").input_values.to(device)
with torch.no_grad():
logits = model(input_values).logits
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)
transcription_text = replace_unk(transcription[0])
print("Transcription:", transcription_text)
```
|