mpasila commited on
Commit
e596d36
·
verified ·
1 Parent(s): 5f60848

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +89 -0
README.md ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ datasets:
4
+ - reazon-research/reazonspeech
5
+ - joujiboi/japanese-anime-speech
6
+ language:
7
+ - ja
8
+ - en
9
+ metrics:
10
+ - cer
11
+ pipeline_tag: automatic-speech-recognition
12
+ ---
13
+
14
+ This is a faster-whisper/ct2 conversion of the original model:
15
+
16
+ [spow12/Visual-novel-transcriptor](https://huggingface.co/spow12/Visual-novel-transcriptor)
17
+
18
+ # Model Card for Model ID
19
+
20
+ Fine tunned ASR model from [distil-whisper/distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2).
21
+
22
+ This model aimed to transcribe japanese audio especially visual novel.
23
+
24
+ # WaifuModel Collections
25
+
26
+ - [TTS](https://huggingface.co/spow12/visual_novel_tts)
27
+ - [Chat](https://huggingface.co/spow12/ChatWaifu_v1.2.1)
28
+ - [ASR](https://huggingface.co/spow12/Visual-novel-transcriptor)
29
+
30
+ # Unified Demo
31
+
32
+ [WaifuAssitant](https://github.com/yw0nam/WaifuAssistant)
33
+
34
+ ## Model Details
35
+
36
+ ### Model Description
37
+
38
+ <!-- Provide a longer summary of what this model is. -->
39
+
40
+ This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
41
+
42
+ - **Developed by:** spow12(yw_nam)
43
+ - **Shared by :** spow12(yw_nam)
44
+ - **Model type:** Seq2Seq
45
+ - **Language(s) (NLP):** japanese
46
+ - **Finetuned from model :** [distil-whisper/distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2).
47
+
48
+
49
+ ## Uses
50
+
51
+ ```python
52
+ from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
53
+ import librosa
54
+
55
+ processor = AutoProcessor.from_pretrained('spow12/Visual-novel-transcriptor', language="ja", task="transcribe")
56
+ model = AutoModelForSpeechSeq2Seq.from_pretrained('spow12/Visual-novel-transcriptor').cuda()
57
+ model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(language="ja", task="transcribe")
58
+
59
+ data, _ = librosa.load(wav_path, sr=16000)
60
+ input_features = processor(data, sampling_rate=16000, return_tensors="pt").input_features.cuda()
61
+ predicted_ids = model.generate(input_features)
62
+ transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
63
+ print(transcription[0])
64
+ ```
65
+
66
+ ## Bias, Risks, and Limitations
67
+
68
+ This model trained by japanese dataset included visual novel which contain nsfw content.
69
+
70
+
71
+ ## Use & Credit
72
+
73
+ This model is currently available for non-commercial use only. Also, since I'm not detailed in licensing, I hope you use it responsibly.
74
+
75
+ By sharing this model, I hope to contribute to the research efforts of our community (the open-source community and anime persons).
76
+
77
+
78
+ ## Citation
79
+
80
+ ```bibtex
81
+ @misc {Visual-novel-transcriptor,
82
+ author = { YoungWoo Nam },
83
+ title = { Visual-novel-transcriptor },
84
+ year = 2024,
85
+ url = { https://huggingface.co/spow12/Visual-novel-transcriptor },
86
+ publisher = { Hugging Face }
87
+ }
88
+ ```
89
+