Fine-tuned model: FransXav/ConvTasNet-IF-Itera-SepNoisy8k-FT
Model ini adalah versi fine-tuned dari JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k
.
Description:
Model ini di-fine-tuning oleh peneliti dari Teknik Informatika, Institut Teknologi Sumatera (ITERA). Proses fine-tuning menggunakan skrip yang tersedia di repositori GitHub proyek. Model dilatih pada dataset custom yang terdiri dari campuran audio vokal berbahasa Indonesia dengan beragam noise.
Fine-tuning config:
# Konfigurasi yang digunakan selama fine-tuning
data:
root: "data/processed/"
sample_rate: 8000
segment_seconds: 4
num_workers: 4
training:
project_name: "itera-speech-separation-ft"
model_name: "ConvTasNet-ITERA-FT" # Nama yang digunakan selama training
epochs: 50
batch_size: 8
learning_rate: 0.0005
gradient_clip_val: 0.5
precision: "16-mixed"
early_stopping_patience: 5
model:
freeze_encoder_decoder: false
remix:
dynamic: true
snr_low: 0.0
snr_high: 10.0
Results
Evaluasi pada test set internal kami menunjukkan hasil sebagai berikut:
si_sdr:
baseline_score: -30.2842
fine_tuned_score: -24.9016
improvement: +5.3826
License Notice
This work, "[NAMA_USERNAME_ANDA]/itera-informatics-convtasnet-ft", is a derivative of JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k
. The original work is a derivative of:
- LibriSpeech ASR corpus by Vassil Panayotov, used under CC BY 4.0;
- The WSJ0 Hipster Ambient Mixtures dataset by Whisper.ai, used under CC BY-NC 4.0.
The original work is licensed under Attribution-ShareAlike 3.0 Unported by Joris Cosentino.
This derivative work is licensed under the MIT License by the project authors at Institut Teknologi Sumatera.
Model tree for FransXav/ConvTasNet-IF-Itera-SepNoisy8k-FT
Base model
JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k