You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Swedish Telephonic ASR

swedish-telephonic-asr is a production-grade speech-to-text model developed by WMR Nordic for transcribing Swedish-language telephonic conversations. The model is designed to handle low-bandwidth, multi-speaker speech as typically found in call centers, support lines, and business hotlines.

Model Overview

  • Architecture: Encoder-decoder (242M parameters)
  • Language: Swedish (sv)
  • Audio Domain: Telephonic conversations, customer service calls, low-bitrate recordings
  • Input Sampling Rate: 8 kHz (resampled to 16 kHz during training)
  • Speaker Support: Handles multi-speaker data with diarization cues (if preprocessed)
  • Token Handling: Conversational markers, filler sounds, and speech disfluencies were normalized and cleaned before training

Training Details

This model was trained across multiple epochs using proprietary Swedish-language telephonic data. The dataset includes:

  • Two-channel call recordings
  • Manually aligned transcripts
  • Cleaned segments with consistent casing, punctuation, and Swedish grammar
  • Transcripts flattened to exclude markup like <overlap>, <lang:...>, ((...)), [cough]

The audio was preprocessed to 16 kHz mono, normalized, and segmented using speaker timestamps.

Evaluation

On a held-out Swedish telephonic test set of 207 manually transcribed speech segments:

Model WER
Fine-tuned (this) 0.170
Base model (Whisper) 0.888

Key strengths:

  • Accurate for long utterances and colloquial phrasing
  • Handles domain-specific terminology (e.g., retail, service, logistics)
  • Transcribes speaker shifts and hesitations more clearly than baseline

Intended Use

This model is ideal for:

  • Customer Support Transcription
  • Call Center Quality Assurance
  • Compliance Monitoring
  • Sentiment & Conversation Analytics
  • Conversational AI Pipelines (Swedish)

Limitations

  • Language: Primarily Swedish, minimal English support
  • Input: Optimized for telephony; less accurate for studio or noisy recordings
  • Format: Non-8kHz audio must be resampled before inference

Access and Licensing

Access is gated for commercial use. To request access:

Manual access review typically completed within 24โ€“48 hours.

Commercial Access

This model is available on AWS Marketplace with scalable infrastructure:

AWS Marketplace

Usage Example

from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
import soundfile as sf

model_name = "WMRNORDIC/swedish-telephonic-asr"
model = AutoModelForSpeechSeq2Seq.from_pretrained(model_name)
processor = AutoProcessor.from_pretrained(model_name)

audio, sr = sf.read("your_call_audio.wav")
inputs = processor(audio, sampling_rate=sr, return_tensors="pt")
output_ids = model.generate(inputs.input_features)
transcription = processor.batch_decode(output_ids, skip_special_tokens=True)[0]

print("Transcription:", transcription)

License

This model is proprietary and licensed by WMR Nordic.
Commercial deployment requires approval and licensing.

Downloads last month
-
Safetensors
Model size
242M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for WMRNORDIC/swedish-telephonic-asr

Finetuned
(2786)
this model