Swedish Telephonic ASR

swedish-telephonic-asr is a production-grade speech-to-text model developed by WMR Nordic for transcribing Swedish-language telephonic conversations. The model is designed to handle low-bandwidth, multi-speaker speech as typically found in call centers, support lines, and business hotlines.

Model Overview

Architecture: Encoder-decoder (242M parameters)
Language: Swedish (sv)
Audio Domain: Telephonic conversations, customer service calls, low-bitrate recordings
Input Sampling Rate: 8 kHz (resampled to 16 kHz during training)
Speaker Support: Handles multi-speaker data with diarization cues (if preprocessed)
Token Handling: Conversational markers, filler sounds, and speech disfluencies were normalized and cleaned before training

Training Details

This model was trained across multiple epochs using proprietary Swedish-language telephonic data. The dataset includes:

Two-channel call recordings
Manually aligned transcripts
Cleaned segments with consistent casing, punctuation, and Swedish grammar
Transcripts flattened to exclude markup like <overlap>, <lang:...>, ((...)), [cough]

The audio was preprocessed to 16 kHz mono, normalized, and segmented using speaker timestamps.

Evaluation

On a held-out Swedish telephonic test set of 207 manually transcribed speech segments:

Model	WER
Fine-tuned (this)	0.170
Base model (Whisper)	0.888

Key strengths:

Accurate for long utterances and colloquial phrasing
Handles domain-specific terminology (e.g., retail, service, logistics)
Transcribes speaker shifts and hesitations more clearly than baseline

Intended Use

This model is ideal for:

Customer Support Transcription
Call Center Quality Assurance
Compliance Monitoring
Sentiment & Conversation Analytics
Conversational AI Pipelines (Swedish)

Limitations

Language: Primarily Swedish, minimal English support
Input: Optimized for telephony; less accurate for studio or noisy recordings
Format: Non-8kHz audio must be resampled before inference

Access and Licensing

Access is gated for commercial use. To request access:

Email: [email protected]
AWS Marketplace: https://aws.amazon.com/marketplace/pp/prod-umarl5doovllg

Manual access review typically completed within 24–48 hours.

Commercial Access

This model is available on AWS Marketplace with scalable infrastructure:

Usage Example

from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
import soundfile as sf

model_name = "WMRNORDIC/swedish-telephonic-asr"
model = AutoModelForSpeechSeq2Seq.from_pretrained(model_name)
processor = AutoProcessor.from_pretrained(model_name)

audio, sr = sf.read("your_call_audio.wav")
inputs = processor(audio, sampling_rate=sr, return_tensors="pt")
output_ids = model.generate(inputs.input_features)
transcription = processor.batch_decode(output_ids, skip_special_tokens=True)[0]

print("Transcription:", transcription)

License

This model is proprietary and licensed by WMR Nordic.
Commercial deployment requires approval and licensing.

WMRNORDIC
/

swedish-telephonic-asr

You need to agree to share your contact information to access this model