fznx92's picture
Update README.md
ab7e403
metadata
library_name: peft
base_model: openai/whisper-large-v2
datasets:
  - mozilla-foundation/common_voice_16_0
language:
  - ja
metrics:
  - wer

Model Card for Model ID

Japanese transcription, testing in progress to see results, main personal use cases are japanese comedy

usage 9GB vram with this Lora

Model Details

Model Description

openai-whisper-large-v2-LORA-ja

  • Developed by: FZNX
  • Model type: PEFT LORA
  • Language(s) (NLP): Fine tune Japanese on whisper common 16
  • License: [More Information Needed]
  • Finetuned from model [optional]: Whisper Large V2

How to Get Started with the Model

import torch from transformers import ( AutomaticSpeechRecognitionPipeline, WhisperForConditionalGeneration, WhisperTokenizer, WhisperProcessor, ) from peft import PeftModel, PeftConfig

peft_model_id = "fznx92/openai-whisper-large-v2-ja-transcribe-colab" sample = "insert mp3 file location here"

language = "japanese" task = "transcribe"

peft_config = PeftConfig.from_pretrained(peft_model_id) model = WhisperForConditionalGeneration.from_pretrained( peft_config.base_model_name_or_path, ) model = PeftModel.from_pretrained(model, peft_model_id) model.to("cuda").half()

processor = WhisperProcessor.from_pretrained(peft_config.base_model_name_or_path, language=language, task=task)

pipe = AutomaticSpeechRecognitionPipeline(model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, batch_size=8, torch_dtype=torch.float16, device="cuda:0")

def transcribe(audio, return_timestamps=False): text = pipe(audio, chunk_length_s=30, return_timestamps=return_timestamps, generate_kwargs={"language": language, "task": task})["text"] return text

transcript = transcribe(sample) print(transcript)

Training Data

Common Voice 16 dataset

Training Procedure

via Google Colab T5 @ 6 hours

Evaluation

Framework versions

  • PEFT 0.7.1