matiows's picture
Update README.md
c4767a8 verified
metadata
language:
  - pap
license: apache-2.0
tags:
  - whisper
  - automatic-speech-recognition
  - papiamento
  - speech-to-text
  - medical
  - healthcare
  - clinical
base_model: sonnygeorge/whisper-tiny-pap
datasets:
  - medical-papiamento-corpus
widget:
  - example_title: Medical Papiamento Sample
    src: https://example.com/medical_sample.wav

Whisper Tiny Papiamento - Medical Domain Adaptation

This model is a medical domain fine-tuned version of sonnygeorge/whisper-tiny-pap specialized for healthcare and clinical Papiamento speech recognition.

Model Description

  • Base model: sonnygeorge/whisper-tiny-pap (Papiamento Whisper model by Sonny George)
  • Domain: Medical/Healthcare Papiamento
  • Language: Papiamento (pap)
  • Specialization: Clinical terminology, medical consultations, healthcare vocabulary
  • Training: Fine-tuned on medical Papiamento audio data

Model Performance

This model builds upon Sonny George's excellent Papiamento Whisper foundation and adds:

  • ✅ Enhanced medical terminology recognition
  • ✅ Clinical context understanding
  • ✅ Healthcare vocabulary optimization
  • ✅ Single speaker adaptation for consistent medical speech patterns

Intended Uses

  • Medical consultation transcription in Papiamento
  • Clinical note generation from Papiamento audio
  • Healthcare documentation automation
  • Medical terminology recognition in Papiamento

Usage

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import librosa

# Load model and processor
processor = WhisperProcessor.from_pretrained("your-username/whisper-tiny-pap-medical")
model = WhisperForConditionalGeneration.from_pretrained("your-username/whisper-tiny-pap-medical")

# Load medical audio
audio, sr = librosa.load("medical_consultation.m4a", sr=16000)

# Process and transcribe
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
predicted_ids = model.generate(inputs["input_features"])
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]

print(transcription)  # Medical Papiamento transcription