SPTK-2

SPTK-2 is an open multilingual automatic speech recognition (ASR) model developed by SVECTOR.
It supports (after revised) 96 languages and offers improved accuracy, timestamp precision, and energy efficiency compared to previous models.

πŸ“„ Read the paper: SPTK: A Framework for Universal Multilingual ASR (2025)


πŸ§ͺ Example Usage

from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
import torchaudio

processor = AutoProcessor.from_pretrained("SVECTOR-CORPORATION/SPTK-2")
model = AutoModelForSpeechSeq2Seq.from_pretrained("SVECTOR-CORPORATION/SPTK-2")

# Load and preprocess audio
audio, sr = torchaudio.load("your_audio_file.mp3")
inputs = processor(audio[0], sampling_rate=sr, return_tensors="pt")

# Generate transcription
with torch.no_grad():
    predicted_ids = model.generate(inputs.input_values)

# Decode output
print(processor.batch_decode(predicted_ids, skip_special_tokens=True))

πŸ“¦ Model Details

  • Model type: Encoder-decoder
  • Architecture: E-Branchformer + Sparse MoE decoder
  • Languages: 99+
  • Supports transcription, translation, timestamps
  • Released: April 2025

πŸ“œ License

This model is licensed under the SVECTOR Proprietary License.
For research or commercial use, please contact [email protected].


πŸ”— Related

Downloads last month
113
Safetensors
Model size
809M params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support