akhanriz commited on
Commit
ed74ac8
·
verified ·
1 Parent(s): e112632

updated readme

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -83,7 +83,7 @@ Audio should be readable by `soundfile`/`librosa`; sample rate is normalized to
83
 
84
  ### 1) Fine-tune Whisper-small
85
  ```bash
86
- python scripts/finetune_whisper_small.py --csv /absolute/path/to/data_for_whisper.csv --out_dir models/whisper-small-finetuned --batch 4 --num_workers 4
87
  ```
88
 
89
  - The trainer uses **lazy transforms** to avoid OOM with large datasets (100k+ files ok).
@@ -96,7 +96,7 @@ ct2-transformers-converter --model models/whisper-small-finetuned --output_d
96
 
97
  ### 2) Train the text classifier
98
  ```bash
99
- python scripts/train_text_classifier.py --csv /absolute/path/to/data_for_text.csv --out models/text_cls.joblib --word_ngrams 1,2 --char_ngrams 3,6
100
  ```
101
  The script prints precision/recall/F1 on a held-out split and writes `models/text_cls.joblib`.
102
 
 
83
 
84
  ### 1) Fine-tune Whisper-small
85
  ```bash
86
+ python scripts/finetune_whisper_small.py --csv /absolute/path/to/sample_stt.csv --out_dir models/whisper-small-finetuned --batch 4 --num_workers 4
87
  ```
88
 
89
  - The trainer uses **lazy transforms** to avoid OOM with large datasets (100k+ files ok).
 
96
 
97
  ### 2) Train the text classifier
98
  ```bash
99
+ python scripts/train_text_classifier.py --csv /absolute/path/to/sample_text_labels.csv --out models/text_cls.joblib --word_ngrams 1,2 --char_ngrams 3,6
100
  ```
101
  The script prints precision/recall/F1 on a held-out split and writes `models/text_cls.joblib`.
102