csukuangfj commited on
Commit
cd47bbf
·
1 Parent(s): f1bd141

Add README.

Browse files
Files changed (1) hide show
  1. README.md +176 -0
README.md ADDED
@@ -0,0 +1,176 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: "en"
3
+ tags:
4
+ - icefall
5
+ - k2
6
+ - transducer
7
+ - librispeech
8
+ - ASR
9
+ - stateless transducer
10
+ - PyTorch
11
+ - RNN-T
12
+ - pruned RNN-T
13
+ - speech recognition
14
+ license: "apache-2.0"
15
+ datasets:
16
+ - librispeech
17
+ metrics:
18
+ - WER
19
+ ---
20
+
21
+
22
+ # Introduction
23
+
24
+ This repo contains pre-trained model using
25
+ <https://github.com/k2-fsa/icefall/pull/248>.
26
+
27
+ It is trained on full LibriSpeech dataset using pruned RNN-T loss from [k2](https://github.com/k2-fsa/k2).
28
+
29
+ ## How to clone this repo
30
+ ```
31
+ sudo apt-get install git-lfs
32
+ git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless-2022-03-12
33
+
34
+ cd icefall-asr-librispeech-pruned-transducer-stateless-2022-03-12
35
+ git lfs pull
36
+ ```
37
+
38
+ **Caution**: You have to run `git lfs pull`. Otherwise, you will be SAD later.
39
+
40
+ The model in this repo is trained using the commit `1603744469d167d848e074f2ea98c587153205fa`.
41
+
42
+ You can use
43
+
44
+ ```
45
+ git clone https://github.com/k2-fsa/icefall
46
+ cd icefall
47
+ git checkout 1603744469d167d848e074f2ea98c587153205fa
48
+ ```
49
+ to download `icefall`.
50
+
51
+
52
+ The decoder architecture is modified from
53
+ [Rnn-Transducer with Stateless Prediction Network](https://ieeexplore.ieee.org/document/9054419).
54
+ A Conv1d layer is placed right after the input embedding layer.
55
+
56
+ -----
57
+
58
+ ## Description
59
+
60
+ This repo provides pre-trained transducer Conformer model for the LibriSpeech dataset
61
+ using [icefall][icefall]. There are no RNNs in the decoder. The decoder is stateless
62
+ and contains only an embedding layer and a Conv1d.
63
+
64
+ The commands for training are:
65
+
66
+ ```
67
+ cd egs/librispeech/ASR/
68
+ ./prepare.sh
69
+
70
+ export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
71
+
72
+ . path.sh
73
+
74
+ ./pruned_transducer_stateless/train.py \
75
+ --world-size 8 \
76
+ --num-epochs 60 \
77
+ --start-epoch 0 \
78
+ --exp-dir pruned_transducer_stateless/exp \
79
+ --full-libri 1 \
80
+ --max-duration 300 \
81
+ --prune-range 5 \
82
+ --lr-factor 5 \
83
+ --lm-scale 0.25
84
+ ```
85
+
86
+ The tensorboard training log can be found at
87
+ <https://tensorboard.dev/experiment/WKRFY5fYSzaVBHahenpNlA/>
88
+
89
+ The command for decoding is:
90
+
91
+ ```bash
92
+ epoch=42
93
+ avg=11
94
+ sym=1
95
+
96
+ # greedy search
97
+
98
+ ./pruned_transducer_stateless/decode.py \
99
+ --epoch $epoch \
100
+ --avg $avg \
101
+ --exp-dir ./pruned_transducer_stateless/exp \
102
+ --max-duration 100 \
103
+ --decoding-method greedy_search \
104
+ --beam-size 4 \
105
+ --max-sym-per-frame $sym
106
+
107
+ # modified beam search
108
+ ./pruned_transducer_stateless/decode.py \
109
+ --epoch $epoch \
110
+ --avg $avg \
111
+ --exp-dir ./pruned_transducer_stateless/exp \
112
+ --max-duration 100 \
113
+ --decoding-method modified_beam_search \
114
+ --beam-size 4
115
+
116
+ # beam search
117
+ # (not recommended)
118
+ ./pruned_transducer_stateless/decode.py \
119
+ --epoch $epoch \
120
+ --avg $avg \
121
+ --exp-dir ./pruned_transducer_stateless/exp \
122
+ --max-duration 100 \
123
+ --decoding-method beam_search \
124
+ --beam-size 4
125
+ ```
126
+
127
+ You can find the decoding log for the above command in this
128
+ repo (in the folder `log`).
129
+
130
+ The WERs for the test datasets are
131
+
132
+ | | test-clean | test-other | comment |
133
+ |-------------------------------------|------------|------------|------------------------------------------|
134
+ | greedy search (max sym per frame 1) | 2.62 | 6.37 | --epoch 42, --avg 11, --max-duration 100 |
135
+ | greedy search (max sym per frame 2) | 2.62 | 6.37 | --epoch 42, --avg 11, --max-duration 100 |
136
+ | greedy search (max sym per frame 3) | 2.62 | 6.37 | --epoch 42, --avg 11, --max-duration 100 |
137
+ | modified beam search (beam size 4) | 2.56 | 6.27 | --epoch 39, --avg 15, --max-duration 100 |
138
+ | beam search (beam size 4) | 2.57 | 6.27 | --epoch 39, --avg 15, --max-duration 100 |
139
+
140
+
141
+ # File description
142
+
143
+ - [log][log], this directory contains the decoding log and decoding results
144
+ - [test_wavs][test_wavs], this directory contains wave files for testing the pre-trained model
145
+ - [data][data], this directory contains files generated by [prepare.sh][prepare]
146
+ - [exp][exp], this directory contains only one file: `preprained.pt`
147
+
148
+ `exp/pretrained.pt` is generated by the following command:
149
+
150
+ ```bash
151
+ epoch=42
152
+ avg=11
153
+
154
+ ./pruned_transducer_stateless/export.py \
155
+ --exp-dir ./pruned_transducer_stateless/exp \
156
+ --bpe-model data/lang_bpe_500/bpe.model \
157
+ --epoch $epoch \
158
+ --avg $avg
159
+ ```
160
+
161
+ **HINT**: To use `pretrained.pt` to compute the WER for test-clean and test-other,
162
+ just do the following:
163
+ ```
164
+ cp icefall-asr-librispeech-pruned-transducer-stateless-2022-03-12/exp/pretrained.pt \
165
+ /path/to/icefall/egs/librispeech/ASR/pruned_transducer_stateless/exp/epoch-999.pt
166
+ ```
167
+ and pass `--epoch 999 --avg 1` to `pruned_transducer_stateless/decode.py`.
168
+
169
+
170
+ [icefall]: https://github.com/k2-fsa/icefall
171
+ [prepare]: https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/prepare.sh
172
+ [exp]: https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless-2022-03-12/tree/main/exp
173
+ [data]: https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless-2022-03-12/tree/main/data
174
+ [test_wavs]: https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless-2022-03-12/tree/main/test_wavs
175
+ [log]: https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless-2022-03-12/tree/main/log
176
+ [icefall]: https://github.com/k2-fsa/icefall