Swedish Telephonic ASR
swedish-telephonic-asr
is a production-grade speech-to-text model developed by WMR Nordic for transcribing Swedish-language telephonic conversations. The model is designed to handle low-bandwidth, multi-speaker speech as typically found in call centers, support lines, and business hotlines.
Model Overview
- Architecture: Encoder-decoder (242M parameters)
- Language: Swedish (
sv
) - Audio Domain: Telephonic conversations, customer service calls, low-bitrate recordings
- Input Sampling Rate: 8 kHz (resampled to 16 kHz during training)
- Speaker Support: Handles multi-speaker data with diarization cues (if preprocessed)
- Token Handling: Conversational markers, filler sounds, and speech disfluencies were normalized and cleaned before training
Training Details
This model was trained across multiple epochs using proprietary Swedish-language telephonic data. The dataset includes:
- Two-channel call recordings
- Manually aligned transcripts
- Cleaned segments with consistent casing, punctuation, and Swedish grammar
- Transcripts flattened to exclude markup like
<overlap>
,<lang:...>
,((...))
,[cough]
The audio was preprocessed to 16 kHz mono, normalized, and segmented using speaker timestamps.
Evaluation
On a held-out Swedish telephonic test set of 207 manually transcribed speech segments:
Model | WER |
---|---|
Fine-tuned (this) | 0.170 |
Base model (Whisper) | 0.888 |
Key strengths:
- Accurate for long utterances and colloquial phrasing
- Handles domain-specific terminology (e.g., retail, service, logistics)
- Transcribes speaker shifts and hesitations more clearly than baseline
Intended Use
This model is ideal for:
- Customer Support Transcription
- Call Center Quality Assurance
- Compliance Monitoring
- Sentiment & Conversation Analytics
- Conversational AI Pipelines (Swedish)
Limitations
- Language: Primarily Swedish, minimal English support
- Input: Optimized for telephony; less accurate for studio or noisy recordings
- Format: Non-8kHz audio must be resampled before inference
Access and Licensing
Access is gated for commercial use. To request access:
- Email: [email protected]
- AWS Marketplace: https://aws.amazon.com/marketplace/pp/prod-umarl5doovllg
Manual access review typically completed within 24โ48 hours.
Commercial Access
This model is available on AWS Marketplace with scalable infrastructure:
Usage Example
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
import soundfile as sf
model_name = "WMRNORDIC/swedish-telephonic-asr"
model = AutoModelForSpeechSeq2Seq.from_pretrained(model_name)
processor = AutoProcessor.from_pretrained(model_name)
audio, sr = sf.read("your_call_audio.wav")
inputs = processor(audio, sampling_rate=sr, return_tensors="pt")
output_ids = model.generate(inputs.input_features)
transcription = processor.batch_decode(output_ids, skip_special_tokens=True)[0]
print("Transcription:", transcription)
License
This model is proprietary and licensed by WMR Nordic.
Commercial deployment requires approval and licensing.
- Downloads last month
- -
Model tree for WMRNORDIC/swedish-telephonic-asr
Base model
openai/whisper-small