Commit
·
383378f
1
Parent(s):
e7c2018
Allow single quotes "'" and hyphens "-"
Browse filesRemove single quotes `'` (id 6) and hyphens `-` (id 12) from `suppress_tokens`. These tokens should **not** be suppressed during generation. They are accepted as valid generated tokens in the official Whisper repo:
https://github.com/openai/whisper/blob/eff383b27b783e280c089475852ba83f20f64998/whisper/tokenizer.py#L258
Check that we're removing the right tokens:
```python
from transformers import WhisperTokenizer
tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-large")
print(tokenizer.decode(6))
print(tokenizer.decode(12))
```
**Print Output:**
```
'
-
```
- config.json +0 -2
config.json
CHANGED
|
@@ -46,12 +46,10 @@
|
|
| 46 |
"suppress_tokens": [
|
| 47 |
1,
|
| 48 |
2,
|
| 49 |
-
6,
|
| 50 |
7,
|
| 51 |
8,
|
| 52 |
9,
|
| 53 |
10,
|
| 54 |
-
12,
|
| 55 |
14,
|
| 56 |
25,
|
| 57 |
26,
|
|
|
|
| 46 |
"suppress_tokens": [
|
| 47 |
1,
|
| 48 |
2,
|
|
|
|
| 49 |
7,
|
| 50 |
8,
|
| 51 |
9,
|
| 52 |
10,
|
|
|
|
| 53 |
14,
|
| 54 |
25,
|
| 55 |
26,
|