--- library_name: transformers license: apache-2.0 datasets: - StofEzz/dataset_c_voice0.2 metrics: - wer base_model: - openai/whisper-tiny --- ## Training Details #### Training Hyperparameters ```yaml defaults: - _self_ dataset: name: "StofEzz/dataset_c_voice0.2" audio_sampling_rate: 16000 num_proc_preprocessing: 4 num_proc_dataset_map: 2 train: 80 test: 20 model: name: "openai/whisper-tiny" language: "french" task: "transcribe" text_preprocessing: chars_to_ignore_regex: "[\\,\\?\\.\\!\\-\\;\\:\\ğ\\ź\\…\\ø\\ắ\\î\\´\\ŏ\\ę\\ź\\&\\'\\v\\ï\\ū\\ė\\ō\\ń\\ø\\…\\σ\\$\\ă\\ß\\ž\\ṯ\\ý\\ℵ\\đ\\ł\\ś\\ň\\ạ\\=\\_\\»\\ċ\\の\\\"\\ぬ\\ễ\\ż\\ć\\ů\\ʿ\\ș\\ı\\ñ\\(\\ò\\ř\\ä\\–\\ş\\«\\š\\ጠ\\°\\ℤ\\~\\\"\\ī\\ț\\č\\ả\\—\\)\\ā\\/\\½\"]" training_args: _target_: transformers.Seq2SeqTrainingArguments output_dir: ./models per_device_train_batch_size: 16 gradient_accumulation_steps: 1 learning_rate: 1e-5 warmup_steps: 500 max_steps: 6250 gradient_checkpointing: true fp16: true evaluation_strategy: "steps" per_device_eval_batch_size: 8 predict_with_generate: true generation_max_length: 225 save_steps: 2000 eval_steps: 100 logging_steps: 25 load_best_model_at_end: true metric_for_best_model: "wer" greater_is_better: false push_to_hub: false ``` #### Metrics WER: 0.46