holvan commited on
Commit
4d3c19a
·
1 Parent(s): ab12bb9

update model

Browse files
README.md ADDED
@@ -0,0 +1,298 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - speaker-recognition
6
+ language: multilingual
7
+ datasets:
8
+ - cnceleb
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 SPK model
13
+
14
+ ### `resnet221`
15
+
16
+ This model was trained by holvan using cnceleb recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
21
+ if you haven't done that already.
22
+
23
+ ```bash
24
+ cd espnet
25
+ git checkout d3db63621b5ea1d7b450d768f97c1a6bdf3cf8be
26
+ pip install -e .
27
+ cd egs2/cnceleb/spk1
28
+ ./run.sh --skip_data_prep false --skip_train true --download_model resnet221
29
+ ```
30
+
31
+ <!-- Generated by scripts/utils/show_spk_result.py -->
32
+ # RESULTS
33
+ ## Environments
34
+ date: 2025-06-17 13:11:19.656417
35
+
36
+ - python version: 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0]
37
+ - espnet version: 202402
38
+ - pytorch version: 2.3.1
39
+
40
+ | | Mean | Std |
41
+ |---|---|---|
42
+ | Target | -1.0174 | 0.1565 |
43
+ | Non-target | -1.3613 | 0.0660 |
44
+
45
+ | Model name | EER(%) | minDCF |
46
+ |---|---|---|
47
+ | conf/train_resnet221 | 6.003 | 0.29403 |
48
+
49
+ ## SPK config
50
+
51
+ <details><summary>expand</summary>
52
+
53
+ ```
54
+ config: conf/train_resnet221.yaml
55
+ print_config: false
56
+ log_level: INFO
57
+ drop_last_iter: true
58
+ dry_run: false
59
+ iterator_type: category
60
+ valid_iterator_type: sequence
61
+ output_dir: exp/spk_train_resnet221_raw_sp
62
+ ngpu: 1
63
+ seed: 0
64
+ num_workers: 8
65
+ num_att_plot: 0
66
+ dist_backend: nccl
67
+ dist_init_method: env://
68
+ dist_world_size: 4
69
+ dist_rank: 0
70
+ local_rank: 0
71
+ dist_master_addr: localhost
72
+ dist_master_port: 54437
73
+ dist_launcher: null
74
+ multiprocessing_distributed: true
75
+ unused_parameters: false
76
+ sharded_ddp: false
77
+ use_deepspeed: false
78
+ deepspeed_config: null
79
+ gradient_as_bucket_view: true
80
+ ddp_comm_hook: null
81
+ cudnn_enabled: true
82
+ cudnn_benchmark: true
83
+ cudnn_deterministic: false
84
+ use_tf32: false
85
+ collect_stats: false
86
+ write_collected_feats: false
87
+ max_epoch: 40
88
+ patience: null
89
+ val_scheduler_criterion:
90
+ - valid
91
+ - loss
92
+ early_stopping_criterion:
93
+ - valid
94
+ - loss
95
+ - min
96
+ best_model_criterion:
97
+ - - valid
98
+ - eer
99
+ - min
100
+ keep_nbest_models: 3
101
+ nbest_averaging_interval: 0
102
+ grad_clip: 9999
103
+ grad_clip_type: 2.0
104
+ grad_noise: false
105
+ accum_grad: 1
106
+ no_forward_run: false
107
+ resume: true
108
+ train_dtype: float32
109
+ use_amp: true
110
+ log_interval: 100
111
+ use_matplotlib: true
112
+ use_tensorboard: true
113
+ create_graph_in_tensorboard: false
114
+ use_wandb: false
115
+ wandb_project: null
116
+ wandb_id: null
117
+ wandb_entity: null
118
+ wandb_name: null
119
+ wandb_model_log_interval: -1
120
+ detect_anomaly: false
121
+ use_adapter: false
122
+ adapter: lora
123
+ save_strategy: all
124
+ adapter_conf: {}
125
+ pretrain_path: null
126
+ init_param: []
127
+ ignore_init_mismatch: false
128
+ freeze_param: []
129
+ num_iters_per_epoch: null
130
+ batch_size: 128
131
+ valid_batch_size: 40
132
+ batch_bins: 1000000
133
+ valid_batch_bins: null
134
+ category_sample_size: 10
135
+ train_shape_file:
136
+ - exp/spk_stats_16k_sp/train/speech_shape
137
+ valid_shape_file:
138
+ - exp/spk_stats_16k_sp/valid/speech_shape
139
+ batch_type: folded
140
+ valid_batch_type: null
141
+ fold_length:
142
+ - 120000
143
+ sort_in_batch: descending
144
+ shuffle_within_batch: false
145
+ sort_batch: descending
146
+ multiple_iterator: false
147
+ chunk_length: 500
148
+ chunk_shift_ratio: 0.5
149
+ num_cache_chunks: 1024
150
+ chunk_excluded_key_prefixes: []
151
+ chunk_default_fs: null
152
+ chunk_max_abs_length: null
153
+ chunk_discard_short_samples: true
154
+ train_data_path_and_name_and_type:
155
+ - - dump/raw/cnceleb_train_sp/wav.scp
156
+ - speech
157
+ - sound
158
+ - - dump/raw/cnceleb_train_sp/utt2spk
159
+ - spk_labels
160
+ - text
161
+ valid_data_path_and_name_and_type:
162
+ - - dump/raw/cnceleb1_valid/trial.scp
163
+ - speech
164
+ - sound
165
+ - - dump/raw/cnceleb1_valid/trial2.scp
166
+ - speech2
167
+ - sound
168
+ - - dump/raw/cnceleb1_valid/trial_label
169
+ - spk_labels
170
+ - text
171
+ multi_task_dataset: false
172
+ allow_variable_data_keys: false
173
+ max_cache_size: 0.0
174
+ max_cache_fd: 32
175
+ allow_multi_rates: false
176
+ valid_max_cache_size: null
177
+ exclude_weight_decay: false
178
+ exclude_weight_decay_conf: {}
179
+ optim: sgd
180
+ optim_conf:
181
+ lr: 0.01
182
+ momentum: 0.9
183
+ weight_decay: 0.0001
184
+ scheduler: exponentialdecaywarmup
185
+ scheduler_conf:
186
+ max_lr: 0.1
187
+ min_lr: 5.0e-05
188
+ total_steps: 590800
189
+ warmup_steps: 73850
190
+ warm_from_zero: true
191
+ init: null
192
+ use_preprocessor: true
193
+ input_size: null
194
+ target_duration: 3.0
195
+ spk2utt: dump/raw/cnceleb_train_sp/spk2utt
196
+ spk_num: 8379
197
+ sample_rate: 16000
198
+ num_eval: 10
199
+ rir_scp: ''
200
+ model_conf:
201
+ extract_feats_in_collect_stats: false
202
+ frontend: melspec_torch
203
+ frontend_conf:
204
+ preemp: true
205
+ n_fft: 512
206
+ log: true
207
+ win_length: 400
208
+ hop_length: 160
209
+ n_mels: 80
210
+ normalize: mn
211
+ specaug: null
212
+ specaug_conf: {}
213
+ normalize: null
214
+ normalize_conf: {}
215
+ encoder: resnet
216
+ encoder_conf:
217
+ resnet_type: resnet221
218
+ pooling: stats
219
+ pooling_conf: {}
220
+ projector: rawnet3
221
+ projector_conf:
222
+ output_size: 128
223
+ preprocessor: spk
224
+ preprocessor_conf:
225
+ target_duration: 3.0
226
+ sample_rate: 16000
227
+ num_eval: 5
228
+ noise_apply_prob: 0.5
229
+ noise_info:
230
+ - - 1.0
231
+ - dump/raw/musan_speech.scp
232
+ - - 4
233
+ - 7
234
+ - - 13
235
+ - 20
236
+ - - 1.0
237
+ - dump/raw/musan_noise.scp
238
+ - - 1
239
+ - 1
240
+ - - 0
241
+ - 15
242
+ - - 1.0
243
+ - dump/raw/musan_music.scp
244
+ - - 1
245
+ - 1
246
+ - - 5
247
+ - 15
248
+ rir_apply_prob: 0.5
249
+ rir_scp: dump/raw/rirs.scp
250
+ loss: aamsoftmax_sc_topk
251
+ loss_conf:
252
+ margin: 0.3
253
+ scale: 30
254
+ K: 3
255
+ mp: 0.06
256
+ k_top: 5
257
+ required:
258
+ - output_dir
259
+ version: '202402'
260
+ distributed: true
261
+ ```
262
+
263
+ </details>
264
+
265
+
266
+
267
+ ### Citing ESPnet
268
+
269
+ ```BibTex
270
+ @inproceedings{watanabe2018espnet,
271
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
272
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
273
+ year={2018},
274
+ booktitle={Proceedings of Interspeech},
275
+ pages={2207--2211},
276
+ doi={10.21437/Interspeech.2018-1456},
277
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
278
+ }
279
+
280
+
281
+
282
+
283
+
284
+
285
+ ```
286
+
287
+ or arXiv:
288
+
289
+ ```bibtex
290
+ @misc{watanabe2018espnet,
291
+ title={ESPnet: End-to-End Speech Processing Toolkit},
292
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
293
+ year={2018},
294
+ eprint={1804.00015},
295
+ archivePrefix={arXiv},
296
+ primaryClass={cs.CL}
297
+ }
298
+ ```
exp/.DS_Store ADDED
Binary file (6.15 kB). View file
 
exp/spk_train_resnet221_raw_sp/35epoch.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:645259b7dd7a89288e82ac7c7f9cdf8d5cf1920fcd36081ee4e2c52831dc8c87
3
+ size 98827571
exp/spk_train_resnet221_raw_sp/RESULTS.md ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_spk_result.py -->
2
+ # RESULTS
3
+ ## Environments
4
+ date: 2025-06-17 13:11:19.656417
5
+
6
+ - python version: 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0]
7
+ - espnet version: 202402
8
+ - pytorch version: 2.3.1
9
+
10
+ | | Mean | Std |
11
+ |---|---|---|
12
+ | Target | -1.0174 | 0.1565 |
13
+ | Non-target | -1.3613 | 0.0660 |
14
+
15
+ | Model name | EER(%) | minDCF |
16
+ |---|---|---|
17
+ | conf/train_resnet221 | 6.003 | 0.29403 |
exp/spk_train_resnet221_raw_sp/config.yaml ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/train_resnet221.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ drop_last_iter: true
5
+ dry_run: false
6
+ iterator_type: category
7
+ valid_iterator_type: sequence
8
+ output_dir: exp/spk_train_resnet221_raw_sp
9
+ ngpu: 1
10
+ seed: 0
11
+ num_workers: 8
12
+ num_att_plot: 0
13
+ dist_backend: nccl
14
+ dist_init_method: env://
15
+ dist_world_size: 4
16
+ dist_rank: 0
17
+ local_rank: 0
18
+ dist_master_addr: localhost
19
+ dist_master_port: 54437
20
+ dist_launcher: null
21
+ multiprocessing_distributed: true
22
+ unused_parameters: false
23
+ sharded_ddp: false
24
+ use_deepspeed: false
25
+ deepspeed_config: null
26
+ gradient_as_bucket_view: true
27
+ ddp_comm_hook: null
28
+ cudnn_enabled: true
29
+ cudnn_benchmark: true
30
+ cudnn_deterministic: false
31
+ use_tf32: false
32
+ collect_stats: false
33
+ write_collected_feats: false
34
+ max_epoch: 40
35
+ patience: null
36
+ val_scheduler_criterion:
37
+ - valid
38
+ - loss
39
+ early_stopping_criterion:
40
+ - valid
41
+ - loss
42
+ - min
43
+ best_model_criterion:
44
+ - - valid
45
+ - eer
46
+ - min
47
+ keep_nbest_models: 3
48
+ nbest_averaging_interval: 0
49
+ grad_clip: 9999
50
+ grad_clip_type: 2.0
51
+ grad_noise: false
52
+ accum_grad: 1
53
+ no_forward_run: false
54
+ resume: true
55
+ train_dtype: float32
56
+ use_amp: true
57
+ log_interval: 100
58
+ use_matplotlib: true
59
+ use_tensorboard: true
60
+ create_graph_in_tensorboard: false
61
+ use_wandb: false
62
+ wandb_project: null
63
+ wandb_id: null
64
+ wandb_entity: null
65
+ wandb_name: null
66
+ wandb_model_log_interval: -1
67
+ detect_anomaly: false
68
+ use_adapter: false
69
+ adapter: lora
70
+ save_strategy: all
71
+ adapter_conf: {}
72
+ pretrain_path: null
73
+ init_param: []
74
+ ignore_init_mismatch: false
75
+ freeze_param: []
76
+ num_iters_per_epoch: null
77
+ batch_size: 128
78
+ valid_batch_size: 40
79
+ batch_bins: 1000000
80
+ valid_batch_bins: null
81
+ category_sample_size: 10
82
+ train_shape_file:
83
+ - exp/spk_stats_16k_sp/train/speech_shape
84
+ valid_shape_file:
85
+ - exp/spk_stats_16k_sp/valid/speech_shape
86
+ batch_type: folded
87
+ valid_batch_type: null
88
+ fold_length:
89
+ - 120000
90
+ sort_in_batch: descending
91
+ shuffle_within_batch: false
92
+ sort_batch: descending
93
+ multiple_iterator: false
94
+ chunk_length: 500
95
+ chunk_shift_ratio: 0.5
96
+ num_cache_chunks: 1024
97
+ chunk_excluded_key_prefixes: []
98
+ chunk_default_fs: null
99
+ chunk_max_abs_length: null
100
+ chunk_discard_short_samples: true
101
+ train_data_path_and_name_and_type:
102
+ - - dump/raw/cnceleb_train_sp/wav.scp
103
+ - speech
104
+ - sound
105
+ - - dump/raw/cnceleb_train_sp/utt2spk
106
+ - spk_labels
107
+ - text
108
+ valid_data_path_and_name_and_type:
109
+ - - dump/raw/cnceleb1_valid/trial.scp
110
+ - speech
111
+ - sound
112
+ - - dump/raw/cnceleb1_valid/trial2.scp
113
+ - speech2
114
+ - sound
115
+ - - dump/raw/cnceleb1_valid/trial_label
116
+ - spk_labels
117
+ - text
118
+ multi_task_dataset: false
119
+ allow_variable_data_keys: false
120
+ max_cache_size: 0.0
121
+ max_cache_fd: 32
122
+ allow_multi_rates: false
123
+ valid_max_cache_size: null
124
+ exclude_weight_decay: false
125
+ exclude_weight_decay_conf: {}
126
+ optim: sgd
127
+ optim_conf:
128
+ lr: 0.01
129
+ momentum: 0.9
130
+ weight_decay: 0.0001
131
+ scheduler: exponentialdecaywarmup
132
+ scheduler_conf:
133
+ max_lr: 0.1
134
+ min_lr: 5.0e-05
135
+ total_steps: 590800
136
+ warmup_steps: 73850
137
+ warm_from_zero: true
138
+ init: null
139
+ use_preprocessor: true
140
+ input_size: null
141
+ target_duration: 3.0
142
+ spk2utt: dump/raw/cnceleb_train_sp/spk2utt
143
+ spk_num: 8379
144
+ sample_rate: 16000
145
+ num_eval: 10
146
+ rir_scp: ''
147
+ model_conf:
148
+ extract_feats_in_collect_stats: false
149
+ frontend: melspec_torch
150
+ frontend_conf:
151
+ preemp: true
152
+ n_fft: 512
153
+ log: true
154
+ win_length: 400
155
+ hop_length: 160
156
+ n_mels: 80
157
+ normalize: mn
158
+ specaug: null
159
+ specaug_conf: {}
160
+ normalize: null
161
+ normalize_conf: {}
162
+ encoder: resnet
163
+ encoder_conf:
164
+ resnet_type: resnet221
165
+ pooling: stats
166
+ pooling_conf: {}
167
+ projector: rawnet3
168
+ projector_conf:
169
+ output_size: 128
170
+ preprocessor: spk
171
+ preprocessor_conf:
172
+ target_duration: 3.0
173
+ sample_rate: 16000
174
+ num_eval: 5
175
+ noise_apply_prob: 0.5
176
+ noise_info:
177
+ - - 1.0
178
+ - dump/raw/musan_speech.scp
179
+ - - 4
180
+ - 7
181
+ - - 13
182
+ - 20
183
+ - - 1.0
184
+ - dump/raw/musan_noise.scp
185
+ - - 1
186
+ - 1
187
+ - - 0
188
+ - 15
189
+ - - 1.0
190
+ - dump/raw/musan_music.scp
191
+ - - 1
192
+ - 1
193
+ - - 5
194
+ - 15
195
+ rir_apply_prob: 0.5
196
+ rir_scp: dump/raw/rirs.scp
197
+ loss: aamsoftmax_sc_topk
198
+ loss_conf:
199
+ margin: 0.3
200
+ scale: 30
201
+ K: 3
202
+ mp: 0.06
203
+ k_top: 5
204
+ required:
205
+ - output_dir
206
+ version: '202402'
207
+ distributed: true
exp/spk_train_resnet221_raw_sp/images/backward_time.png ADDED
exp/spk_train_resnet221_raw_sp/images/clip.png ADDED
exp/spk_train_resnet221_raw_sp/images/eer.png ADDED
exp/spk_train_resnet221_raw_sp/images/forward_time.png ADDED
exp/spk_train_resnet221_raw_sp/images/gpu_max_cached_mem_GB.png ADDED
exp/spk_train_resnet221_raw_sp/images/grad_norm.png ADDED
exp/spk_train_resnet221_raw_sp/images/iter_time.png ADDED
exp/spk_train_resnet221_raw_sp/images/loss.png ADDED
exp/spk_train_resnet221_raw_sp/images/loss_scale.png ADDED
exp/spk_train_resnet221_raw_sp/images/mindcf.png ADDED
exp/spk_train_resnet221_raw_sp/images/n_trials.png ADDED
exp/spk_train_resnet221_raw_sp/images/nontrg_mean.png ADDED
exp/spk_train_resnet221_raw_sp/images/nontrg_std.png ADDED
exp/spk_train_resnet221_raw_sp/images/optim0_lr0.png ADDED
exp/spk_train_resnet221_raw_sp/images/optim_step_time.png ADDED
exp/spk_train_resnet221_raw_sp/images/train_time.png ADDED
exp/spk_train_resnet221_raw_sp/images/trg_mean.png ADDED
exp/spk_train_resnet221_raw_sp/images/trg_std.png ADDED
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: '202402'
2
+ files:
3
+ model_file: exp/spk_train_resnet221_raw_sp/35epoch.pth
4
+ python: 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0]
5
+ timestamp: 1750139929.052598
6
+ torch: 2.3.1
7
+ yaml_files:
8
+ train_config: exp/spk_train_resnet221_raw_sp/config.yaml