Whisper-Small Persian STT — LoRA Fine-Tuned

A fine-tuned version of openai/whisper-small for Persian speech-to-text (ASR) using LoRA.
This model is optimized for persian conversational speech and dataset-quality audio.


Model Details

Model Description

This model is a LoRA fine-tuned Whisper Small focused on Persian (fa) speech recognition.
It improves transcription accuracy on standard Persian audio segments (16kHz, mono, normalized WAV).

  • Developed by: Mehdi Pouladrag
  • Model type: Speech-to-Text (ASR) — Whisper Small (Seq2Seq Transformer)
  • Language(s): Persian (fa)
  • License: MIT (or your preferred license)
  • Finetuned from: openai/whisper-small
  • Dataset: persian-voice-v1 (single dataset)
  • Training technique: LoRA (Low-Rank Adaptation)

Model Sources


Uses

Direct Use

  • Convert Persian speech to text
  • Subtitle generation for Persian audio
  • Conversational ASR
  • Podcast / video transcription
  • General Persian content recognition

Downstream Use

  • Integrate into ASR pipelines
  • Use in real-time Persian voice applications
  • Further fine-tuning on custom Persian domains (medical, legal, etc.)

Out-of-Scope Use

  • Non-Persian audio
  • Low-quality/noisy multi-speaker overlapping speech
  • Misuse for surveillance or unethical monitoring

Bias, Risks, and Limitations

  • Whisper may still struggle with dialect-heavy, noisy, or low-quality audio.
  • The dataset used is relatively limited (~6099 audio–subtitle pairs), so:
    • Certain accents may be underrepresented.
    • Model may hallucinate or mis-transcribe in rare cases.

Recommendations

Users should:

  • Provide clean 16kHz mono WAV audio
  • Use domain-specific fine-tuning if necessary
  • Validate outputs before critical use
Downloads last month
27
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DevMehdip/whisper-small-fa-lora

Adapter
(149)
this model

Dataset used to train DevMehdip/whisper-small-fa-lora