Training Details

Training Hyperparameters

defaults:
  - _self_
dataset:
  name: "StofEzz/dataset_c_voice0.2"
  audio_sampling_rate: 16000
  num_proc_preprocessing: 4
  num_proc_dataset_map: 2
  train: 80
  test: 20

model:
  name: "openai/whisper-tiny"
  language: "french"
  task: "transcribe"

text_preprocessing:
  chars_to_ignore_regex: "[\\,\\?\\.\\!\\-\\;\\:\\ğ\\ź\\…\\ø\\ắ\\î\\´\\ŏ\\ę\\ź\\&\\'\\v\\ï\\ū\\ė\\ō\\ń\\ø\\…\\σ\\$\\ă\\ß\\ž\\ṯ\\ý\\ℵ\\đ\\ł\\ś\\ň\\ạ\\=\\_\\»\\ċ\\の\\\"\\ぬ\\ễ\\ż\\ć\\ů\\ʿ\\ș\\ı\\ñ\\(\\ò\\ř\\ä\\–\\ş\\«\\š\\ጠ\\°\\ℤ\\~\\\"\\ī\\ț\\č\\ả\\—\\)\\ā\\/\\½\"]"

training_args:
  _target_: transformers.Seq2SeqTrainingArguments
  output_dir: ./models
  per_device_train_batch_size: 16
  gradient_accumulation_steps: 1
  learning_rate: 1e-5
  warmup_steps: 500
  max_steps: 6250
  gradient_checkpointing: true
  fp16: true
  evaluation_strategy: "steps"
  per_device_eval_batch_size: 8
  predict_with_generate: true
  generation_max_length: 225
  save_steps: 2000
  eval_steps: 100
  logging_steps: 25
  load_best_model_at_end: true
  metric_for_best_model: "wer"
  greater_is_better: false
  push_to_hub: false

Metrics

WER: 0.46

Downloads last month
38
Safetensors
Model size
37.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for smeoni/whisper-tiny-fr

Finetuned
(1524)
this model

Dataset used to train smeoni/whisper-tiny-fr