EkaCare Parrotlet-a-en-5b
This is a purpose-built automatic speech recognition (ASR) model specifically trained for english speech in Indian healthcare setting optimised for transcribing medical speech. This model combines Whisper V3 large encoder and MedGemma3 4B decoder through a lean projector layer for efficient speech-to-text conversion.
A detailed description of this model can be obtained from this blog post.
Installation Requirements
To use this model, you need to install the following dependencies:
Python 3.10 and the following packages using pip:
pip install torchaudio>=2.7.0 torch>=2.7.0 transformers>=4.52.0 librosa soundfile python-dotenv huggingface_hub
Below is an example of how to load and use this model for automatic speech recognition using the Hugging Face Transformers library.
Loading the model from Hugging Face Hub
from transformers import AutoModel
import librosa
repo_name = "ekacare/parrotlet-a-en-5b"
model = AutoModel.from_pretrained(repo_name, trust_remote_code=True)
Load an audio file
audio_path = "path/to/your/audio.mp3"
audio, sample_rate = librosa.load(audio_path, sr=16000) # Resample to 16kHz if needed
Perform speech recognition
transcription = model.transcribe(audio, sample_rate)
print("Transcription:", transcription)
Notes
- Ensure the audio input is in compatible format (wav, mp3) with a 16kHz sampling rate for optimal performance.
- This model handles short form audio with chunk size smaller than 30 seconds.
Authentication (if required)
Set up your Hugging Face token (if required):
Log in to your Hugging Face account and generate an access token at Hugging Face Settings. Set the token in your environment:
export HF_TOKEN="your-access-token"
Alternatively, use the Hugging Face CLI to log in:
huggingface-cli login
License
This model is released under the MIT License, enabling broad use while maintaining attribution requirements.
- Downloads last month
- 23