SPTK-2
SPTK-2 is an open multilingual automatic speech recognition (ASR) model developed by SVECTOR.
It supports (after revised) 96 languages and offers improved accuracy, timestamp precision, and energy efficiency compared to previous models.
π Read the paper: SPTK: A Framework for Universal Multilingual ASR (2025)
π§ͺ Example Usage
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
import torchaudio
processor = AutoProcessor.from_pretrained("SVECTOR-CORPORATION/SPTK-2")
model = AutoModelForSpeechSeq2Seq.from_pretrained("SVECTOR-CORPORATION/SPTK-2")
# Load and preprocess audio
audio, sr = torchaudio.load("your_audio_file.mp3")
inputs = processor(audio[0], sampling_rate=sr, return_tensors="pt")
# Generate transcription
with torch.no_grad():
predicted_ids = model.generate(inputs.input_values)
# Decode output
print(processor.batch_decode(predicted_ids, skip_special_tokens=True))
π¦ Model Details
- Model type: Encoder-decoder
- Architecture: E-Branchformer + Sparse MoE decoder
- Languages: 99+
- Supports transcription, translation, timestamps
- Released: April 2025
π License
This model is licensed under the SVECTOR Proprietary License.
For research or commercial use, please contact [email protected].
π Related
- Downloads last month
- 113
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support