holvan commited on
Commit
6fba24a
·
1 Parent(s): 5b26876

update model

Browse files
README.md ADDED
@@ -0,0 +1,297 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - speaker-recognition
6
+ language: multilingual
7
+ datasets:
8
+ - cnceleb
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 SPK model
13
+
14
+ ### `resnet34`
15
+
16
+ This model was trained by holvan using cnceleb recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
21
+ if you haven't done that already.
22
+
23
+ ```bash
24
+ cd espnet
25
+ git checkout d3db63621b5ea1d7b450d768f97c1a6bdf3cf8be
26
+ pip install -e .
27
+ cd egs2/cnceleb/spk1
28
+ ./run.sh --skip_data_prep false --skip_train true --download_model resnet34
29
+ ```
30
+
31
+ <!-- Generated by scripts/utils/show_spk_result.py -->
32
+ # RESULTS
33
+ ## Environments
34
+ date: 2025-06-17 13:11:43.386422
35
+
36
+ - python version: 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0]
37
+ - espnet version: 202402
38
+ - pytorch version: 2.3.1
39
+
40
+ | | Mean | Std |
41
+ |---|---|---|
42
+ | Target | -1.0307 | 0.1501 |
43
+ | Non-target | -1.3465 | 0.0693 |
44
+
45
+ | Model name | EER(%) | minDCF |
46
+ |---|---|---|
47
+ | conf/train_resnet34 | 7.119 | 0.34991 |
48
+
49
+ ## SPK config
50
+
51
+ <details><summary>expand</summary>
52
+
53
+ ```
54
+ config: conf/train_resnet34.yaml
55
+ print_config: false
56
+ log_level: INFO
57
+ drop_last_iter: true
58
+ dry_run: false
59
+ iterator_type: category
60
+ valid_iterator_type: sequence
61
+ output_dir: exp/spk_train_resnet34_raw_sp
62
+ ngpu: 1
63
+ seed: 0
64
+ num_workers: 8
65
+ num_att_plot: 0
66
+ dist_backend: nccl
67
+ dist_init_method: env://
68
+ dist_world_size: 4
69
+ dist_rank: 0
70
+ local_rank: 0
71
+ dist_master_addr: localhost
72
+ dist_master_port: 33749
73
+ dist_launcher: null
74
+ multiprocessing_distributed: true
75
+ unused_parameters: false
76
+ sharded_ddp: false
77
+ use_deepspeed: false
78
+ deepspeed_config: null
79
+ gradient_as_bucket_view: true
80
+ ddp_comm_hook: null
81
+ cudnn_enabled: true
82
+ cudnn_benchmark: true
83
+ cudnn_deterministic: false
84
+ use_tf32: false
85
+ collect_stats: false
86
+ write_collected_feats: false
87
+ max_epoch: 40
88
+ patience: null
89
+ val_scheduler_criterion:
90
+ - valid
91
+ - loss
92
+ early_stopping_criterion:
93
+ - valid
94
+ - loss
95
+ - min
96
+ best_model_criterion:
97
+ - - valid
98
+ - eer
99
+ - min
100
+ keep_nbest_models: 3
101
+ nbest_averaging_interval: 0
102
+ grad_clip: 9999
103
+ grad_clip_type: 2.0
104
+ grad_noise: false
105
+ accum_grad: 1
106
+ no_forward_run: false
107
+ resume: true
108
+ train_dtype: float32
109
+ use_amp: true
110
+ log_interval: 100
111
+ use_matplotlib: true
112
+ use_tensorboard: true
113
+ create_graph_in_tensorboard: false
114
+ use_wandb: false
115
+ wandb_project: null
116
+ wandb_id: null
117
+ wandb_entity: null
118
+ wandb_name: null
119
+ wandb_model_log_interval: -1
120
+ detect_anomaly: false
121
+ use_adapter: false
122
+ adapter: lora
123
+ save_strategy: all
124
+ adapter_conf: {}
125
+ pretrain_path: null
126
+ init_param: []
127
+ ignore_init_mismatch: false
128
+ freeze_param: []
129
+ num_iters_per_epoch: null
130
+ batch_size: 512
131
+ valid_batch_size: 40
132
+ batch_bins: 1000000
133
+ valid_batch_bins: null
134
+ category_sample_size: 10
135
+ train_shape_file:
136
+ - exp/spk_stats_16k_sp/train/speech_shape
137
+ valid_shape_file:
138
+ - exp/spk_stats_16k_sp/valid/speech_shape
139
+ batch_type: folded
140
+ valid_batch_type: null
141
+ fold_length:
142
+ - 120000
143
+ sort_in_batch: descending
144
+ shuffle_within_batch: false
145
+ sort_batch: descending
146
+ multiple_iterator: false
147
+ chunk_length: 500
148
+ chunk_shift_ratio: 0.5
149
+ num_cache_chunks: 1024
150
+ chunk_excluded_key_prefixes: []
151
+ chunk_default_fs: null
152
+ chunk_max_abs_length: null
153
+ chunk_discard_short_samples: true
154
+ train_data_path_and_name_and_type:
155
+ - - dump/raw/cnceleb_train_sp/wav.scp
156
+ - speech
157
+ - sound
158
+ - - dump/raw/cnceleb_train_sp/utt2spk
159
+ - spk_labels
160
+ - text
161
+ valid_data_path_and_name_and_type:
162
+ - - dump/raw/cnceleb1_valid/trial.scp
163
+ - speech
164
+ - sound
165
+ - - dump/raw/cnceleb1_valid/trial2.scp
166
+ - speech2
167
+ - sound
168
+ - - dump/raw/cnceleb1_valid/trial_label
169
+ - spk_labels
170
+ - text
171
+ multi_task_dataset: false
172
+ allow_variable_data_keys: false
173
+ max_cache_size: 0.0
174
+ max_cache_fd: 32
175
+ allow_multi_rates: false
176
+ valid_max_cache_size: null
177
+ exclude_weight_decay: false
178
+ exclude_weight_decay_conf: {}
179
+ optim: sgd
180
+ optim_conf:
181
+ lr: 0.1
182
+ momentum: 0.9
183
+ weight_decay: 0.0001
184
+ scheduler: exponentialdecaywarmup
185
+ scheduler_conf:
186
+ max_lr: 0.1
187
+ min_lr: 5.0e-05
188
+ total_steps: 140720
189
+ warmup_steps: 17590
190
+ init: null
191
+ use_preprocessor: true
192
+ input_size: null
193
+ target_duration: 3.0
194
+ spk2utt: dump/raw/cnceleb_train_sp/spk2utt
195
+ spk_num: 8379
196
+ sample_rate: 16000
197
+ num_eval: 10
198
+ rir_scp: ''
199
+ model_conf:
200
+ extract_feats_in_collect_stats: false
201
+ frontend: melspec_torch
202
+ frontend_conf:
203
+ preemp: true
204
+ n_fft: 512
205
+ log: true
206
+ win_length: 400
207
+ hop_length: 160
208
+ n_mels: 80
209
+ normalize: mn
210
+ specaug: null
211
+ specaug_conf: {}
212
+ normalize: null
213
+ normalize_conf: {}
214
+ encoder: resnet
215
+ encoder_conf:
216
+ resnet_type: resnet34
217
+ pooling: stats
218
+ pooling_conf: {}
219
+ projector: rawnet3
220
+ projector_conf:
221
+ output_size: 256
222
+ preprocessor: spk
223
+ preprocessor_conf:
224
+ target_duration: 3.0
225
+ sample_rate: 16000
226
+ num_eval: 5
227
+ noise_apply_prob: 0.5
228
+ noise_info:
229
+ - - 1.0
230
+ - dump/raw/musan_speech.scp
231
+ - - 4
232
+ - 7
233
+ - - 13
234
+ - 20
235
+ - - 1.0
236
+ - dump/raw/musan_noise.scp
237
+ - - 1
238
+ - 1
239
+ - - 0
240
+ - 15
241
+ - - 1.0
242
+ - dump/raw/musan_music.scp
243
+ - - 1
244
+ - 1
245
+ - - 5
246
+ - 15
247
+ rir_apply_prob: 0.5
248
+ rir_scp: dump/raw/rirs.scp
249
+ loss: aamsoftmax_sc_topk
250
+ loss_conf:
251
+ margin: 0.3
252
+ scale: 30
253
+ K: 3
254
+ mp: 0.06
255
+ k_top: 5
256
+ required:
257
+ - output_dir
258
+ version: '202402'
259
+ distributed: true
260
+ ```
261
+
262
+ </details>
263
+
264
+
265
+
266
+ ### Citing ESPnet
267
+
268
+ ```BibTex
269
+ @inproceedings{watanabe2018espnet,
270
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
271
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
272
+ year={2018},
273
+ booktitle={Proceedings of Interspeech},
274
+ pages={2207--2211},
275
+ doi={10.21437/Interspeech.2018-1456},
276
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
277
+ }
278
+
279
+
280
+
281
+
282
+
283
+
284
+ ```
285
+
286
+ or arXiv:
287
+
288
+ ```bibtex
289
+ @misc{watanabe2018espnet,
290
+ title={ESPnet: End-to-End Speech Processing Toolkit},
291
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
292
+ year={2018},
293
+ eprint={1804.00015},
294
+ archivePrefix={arXiv},
295
+ primaryClass={cs.CL}
296
+ }
297
+ ```
exp/spk_train_resnet34_raw_sp/30epoch.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c19248fca2ac38fdd4b26d3662edec4337c9b4e7b472a5dbe69a0cf152c07451
3
+ size 52554348
exp/spk_train_resnet34_raw_sp/RESULTS.md ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_spk_result.py -->
2
+ # RESULTS
3
+ ## Environments
4
+ date: 2025-06-17 13:11:43.386422
5
+
6
+ - python version: 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0]
7
+ - espnet version: 202402
8
+ - pytorch version: 2.3.1
9
+
10
+ | | Mean | Std |
11
+ |---|---|---|
12
+ | Target | -1.0307 | 0.1501 |
13
+ | Non-target | -1.3465 | 0.0693 |
14
+
15
+ | Model name | EER(%) | minDCF |
16
+ |---|---|---|
17
+ | conf/train_resnet34 | 7.119 | 0.34991 |
exp/spk_train_resnet34_raw_sp/config.yaml ADDED
@@ -0,0 +1,206 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/train_resnet34.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ drop_last_iter: true
5
+ dry_run: false
6
+ iterator_type: category
7
+ valid_iterator_type: sequence
8
+ output_dir: exp/spk_train_resnet34_raw_sp
9
+ ngpu: 1
10
+ seed: 0
11
+ num_workers: 8
12
+ num_att_plot: 0
13
+ dist_backend: nccl
14
+ dist_init_method: env://
15
+ dist_world_size: 4
16
+ dist_rank: 0
17
+ local_rank: 0
18
+ dist_master_addr: localhost
19
+ dist_master_port: 33749
20
+ dist_launcher: null
21
+ multiprocessing_distributed: true
22
+ unused_parameters: false
23
+ sharded_ddp: false
24
+ use_deepspeed: false
25
+ deepspeed_config: null
26
+ gradient_as_bucket_view: true
27
+ ddp_comm_hook: null
28
+ cudnn_enabled: true
29
+ cudnn_benchmark: true
30
+ cudnn_deterministic: false
31
+ use_tf32: false
32
+ collect_stats: false
33
+ write_collected_feats: false
34
+ max_epoch: 40
35
+ patience: null
36
+ val_scheduler_criterion:
37
+ - valid
38
+ - loss
39
+ early_stopping_criterion:
40
+ - valid
41
+ - loss
42
+ - min
43
+ best_model_criterion:
44
+ - - valid
45
+ - eer
46
+ - min
47
+ keep_nbest_models: 3
48
+ nbest_averaging_interval: 0
49
+ grad_clip: 9999
50
+ grad_clip_type: 2.0
51
+ grad_noise: false
52
+ accum_grad: 1
53
+ no_forward_run: false
54
+ resume: true
55
+ train_dtype: float32
56
+ use_amp: true
57
+ log_interval: 100
58
+ use_matplotlib: true
59
+ use_tensorboard: true
60
+ create_graph_in_tensorboard: false
61
+ use_wandb: false
62
+ wandb_project: null
63
+ wandb_id: null
64
+ wandb_entity: null
65
+ wandb_name: null
66
+ wandb_model_log_interval: -1
67
+ detect_anomaly: false
68
+ use_adapter: false
69
+ adapter: lora
70
+ save_strategy: all
71
+ adapter_conf: {}
72
+ pretrain_path: null
73
+ init_param: []
74
+ ignore_init_mismatch: false
75
+ freeze_param: []
76
+ num_iters_per_epoch: null
77
+ batch_size: 512
78
+ valid_batch_size: 40
79
+ batch_bins: 1000000
80
+ valid_batch_bins: null
81
+ category_sample_size: 10
82
+ train_shape_file:
83
+ - exp/spk_stats_16k_sp/train/speech_shape
84
+ valid_shape_file:
85
+ - exp/spk_stats_16k_sp/valid/speech_shape
86
+ batch_type: folded
87
+ valid_batch_type: null
88
+ fold_length:
89
+ - 120000
90
+ sort_in_batch: descending
91
+ shuffle_within_batch: false
92
+ sort_batch: descending
93
+ multiple_iterator: false
94
+ chunk_length: 500
95
+ chunk_shift_ratio: 0.5
96
+ num_cache_chunks: 1024
97
+ chunk_excluded_key_prefixes: []
98
+ chunk_default_fs: null
99
+ chunk_max_abs_length: null
100
+ chunk_discard_short_samples: true
101
+ train_data_path_and_name_and_type:
102
+ - - dump/raw/cnceleb_train_sp/wav.scp
103
+ - speech
104
+ - sound
105
+ - - dump/raw/cnceleb_train_sp/utt2spk
106
+ - spk_labels
107
+ - text
108
+ valid_data_path_and_name_and_type:
109
+ - - dump/raw/cnceleb1_valid/trial.scp
110
+ - speech
111
+ - sound
112
+ - - dump/raw/cnceleb1_valid/trial2.scp
113
+ - speech2
114
+ - sound
115
+ - - dump/raw/cnceleb1_valid/trial_label
116
+ - spk_labels
117
+ - text
118
+ multi_task_dataset: false
119
+ allow_variable_data_keys: false
120
+ max_cache_size: 0.0
121
+ max_cache_fd: 32
122
+ allow_multi_rates: false
123
+ valid_max_cache_size: null
124
+ exclude_weight_decay: false
125
+ exclude_weight_decay_conf: {}
126
+ optim: sgd
127
+ optim_conf:
128
+ lr: 0.1
129
+ momentum: 0.9
130
+ weight_decay: 0.0001
131
+ scheduler: exponentialdecaywarmup
132
+ scheduler_conf:
133
+ max_lr: 0.1
134
+ min_lr: 5.0e-05
135
+ total_steps: 140720
136
+ warmup_steps: 17590
137
+ init: null
138
+ use_preprocessor: true
139
+ input_size: null
140
+ target_duration: 3.0
141
+ spk2utt: dump/raw/cnceleb_train_sp/spk2utt
142
+ spk_num: 8379
143
+ sample_rate: 16000
144
+ num_eval: 10
145
+ rir_scp: ''
146
+ model_conf:
147
+ extract_feats_in_collect_stats: false
148
+ frontend: melspec_torch
149
+ frontend_conf:
150
+ preemp: true
151
+ n_fft: 512
152
+ log: true
153
+ win_length: 400
154
+ hop_length: 160
155
+ n_mels: 80
156
+ normalize: mn
157
+ specaug: null
158
+ specaug_conf: {}
159
+ normalize: null
160
+ normalize_conf: {}
161
+ encoder: resnet
162
+ encoder_conf:
163
+ resnet_type: resnet34
164
+ pooling: stats
165
+ pooling_conf: {}
166
+ projector: rawnet3
167
+ projector_conf:
168
+ output_size: 256
169
+ preprocessor: spk
170
+ preprocessor_conf:
171
+ target_duration: 3.0
172
+ sample_rate: 16000
173
+ num_eval: 5
174
+ noise_apply_prob: 0.5
175
+ noise_info:
176
+ - - 1.0
177
+ - dump/raw/musan_speech.scp
178
+ - - 4
179
+ - 7
180
+ - - 13
181
+ - 20
182
+ - - 1.0
183
+ - dump/raw/musan_noise.scp
184
+ - - 1
185
+ - 1
186
+ - - 0
187
+ - 15
188
+ - - 1.0
189
+ - dump/raw/musan_music.scp
190
+ - - 1
191
+ - 1
192
+ - - 5
193
+ - 15
194
+ rir_apply_prob: 0.5
195
+ rir_scp: dump/raw/rirs.scp
196
+ loss: aamsoftmax_sc_topk
197
+ loss_conf:
198
+ margin: 0.3
199
+ scale: 30
200
+ K: 3
201
+ mp: 0.06
202
+ k_top: 5
203
+ required:
204
+ - output_dir
205
+ version: '202402'
206
+ distributed: true
exp/spk_train_resnet34_raw_sp/images/backward_time.png ADDED
exp/spk_train_resnet34_raw_sp/images/clip.png ADDED
exp/spk_train_resnet34_raw_sp/images/eer.png ADDED
exp/spk_train_resnet34_raw_sp/images/forward_time.png ADDED
exp/spk_train_resnet34_raw_sp/images/gpu_max_cached_mem_GB.png ADDED
exp/spk_train_resnet34_raw_sp/images/grad_norm.png ADDED
exp/spk_train_resnet34_raw_sp/images/iter_time.png ADDED
exp/spk_train_resnet34_raw_sp/images/loss.png ADDED
exp/spk_train_resnet34_raw_sp/images/loss_scale.png ADDED
exp/spk_train_resnet34_raw_sp/images/mindcf.png ADDED
exp/spk_train_resnet34_raw_sp/images/n_trials.png ADDED
exp/spk_train_resnet34_raw_sp/images/nontrg_mean.png ADDED
exp/spk_train_resnet34_raw_sp/images/nontrg_std.png ADDED
exp/spk_train_resnet34_raw_sp/images/optim0_lr0.png ADDED
exp/spk_train_resnet34_raw_sp/images/optim_step_time.png ADDED
exp/spk_train_resnet34_raw_sp/images/train_time.png ADDED
exp/spk_train_resnet34_raw_sp/images/trg_mean.png ADDED
exp/spk_train_resnet34_raw_sp/images/trg_std.png ADDED
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: '202402'
2
+ files:
3
+ model_file: exp/spk_train_resnet34_raw_sp/30epoch.pth
4
+ python: 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0]
5
+ timestamp: 1750139904.197759
6
+ torch: 2.3.1
7
+ yaml_files:
8
+ train_config: exp/spk_train_resnet34_raw_sp/config.yaml