Update README.md
Browse files
README.md
CHANGED
|
@@ -58,7 +58,7 @@ This model was trained on the following datasets:
|
|
| 58 |
|
| 59 |
This model was trained in two main phases:
|
| 60 |
- Knesset based pre-training - over all ~4700h of data - 3 epochs, ~54h run
|
| 61 |
-
- Mixed post-training over all crowd-transcribe-v5 (
|
| 62 |
- Interleaving of datasets with sampling probs: (0.9, 0.025, 0.075) respectively
|
| 63 |
- Note that crowd-transcribe-v5 has about 5x shorter samples on average thus the over-sampling.
|
| 64 |
|
|
|
|
| 58 |
|
| 59 |
This model was trained in two main phases:
|
| 60 |
- Knesset based pre-training - over all ~4700h of data - 3 epochs, ~54h run
|
| 61 |
+
- Mixed post-training over all crowd-transcribe-v5 (300h), crowd-recital-whisper-training (50h) and highest-quality filtered knessets data (150h) - 1 epoch
|
| 62 |
- Interleaving of datasets with sampling probs: (0.9, 0.025, 0.075) respectively
|
| 63 |
- Note that crowd-transcribe-v5 has about 5x shorter samples on average thus the over-sampling.
|
| 64 |
|