File size: 2,580 Bytes
9d200ca
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f203e20
9d200ca
f203e20
9d200ca
 
f203e20
9d200ca
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
04c43a0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
license: mit
language:
- id
- en
library_name: pytorch
tags:
- audio-source-separation
- speech-separation
- convtasnet
- asteroid
- itera
datasets:
- librimix
- custom-indonesian-noisy-speech
metrics:
- si-sdr
base_model: JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k
pipeline_tag: audio-to-audio
---

## Fine-tuned model: [FransXav/ConvTasNet-IF-Itera-SepNoisy8k-FT](https://huggingface.co/FransXav/ConvTasNet-IF-Itera-SepNoisy8k-FT)

Model ini adalah versi *fine-tuned* dari [`JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k`](https://huggingface.co/JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k).

### Description:
Model ini di-*fine-tuning* oleh peneliti dari **Teknik Informatika, Institut Teknologi Sumatera (ITERA)**. Proses *fine-tuning* menggunakan skrip yang tersedia di [repositori GitHub proyek](https://github.com/fransiskus-121140010/itera-informatics-convtasnet-ft). Model dilatih pada dataset *custom* yang terdiri dari campuran audio vokal berbahasa Indonesia dengan beragam *noise*.

### Fine-tuning config:
```yaml
# Konfigurasi yang digunakan selama fine-tuning
data:
  root: "data/processed/"
  sample_rate: 8000
  segment_seconds: 4
  num_workers: 4

training:
  project_name: "itera-speech-separation-ft"
  model_name: "ConvTasNet-ITERA-FT" # Nama yang digunakan selama training
  epochs: 50
  batch_size: 8
  learning_rate: 0.0005
  gradient_clip_val: 0.5
  precision: "16-mixed"
  early_stopping_patience: 5

model:
  freeze_encoder_decoder: false

remix:
  dynamic: true
  snr_low: 0.0
  snr_high: 10.0
```

## Results

Evaluasi pada test set internal kami menunjukkan hasil sebagai berikut:
```yaml
si_sdr:
    baseline_score: -30.2842
    fine_tuned_score: -24.9016
    improvement: +5.3826
```

### License Notice

This work, "[NAMA_USERNAME_ANDA]/itera-informatics-convtasnet-ft", is a derivative of [`JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k`](https://huggingface.co/JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k). The original work is a derivative of:
> * [LibriSpeech ASR corpus](https://www.openslr.org/12) by Vassil Panayotov, used under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/);
> * The WSJ0 Hipster Ambient Mixtures dataset by [Whisper.ai](https://whisper.ai/), used under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/).
>
> The original work is licensed under [Attribution-ShareAlike 3.0 Unported](https://creativecommons.org/licenses/by-sa/3.0/) by Joris Cosentino.

This derivative work is licensed under the **[MIT License](https://opensource.org/licenses/MIT)** by the project authors at Institut Teknologi Sumatera.