technicalheist commited on
Commit
3de0fd6
·
verified ·
1 Parent(s): 4889ed5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +89 -3
README.md CHANGED
@@ -1,3 +1,89 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - text-to-speech
5
+ - audio
6
+ - speech
7
+ language:
8
+ - en
9
+ pipeline_tag: text-to-speech
10
+ model-index:
11
+ - name: VibeVoice-1.5B
12
+ results: []
13
+ ---
14
+
15
+
16
+ # VibeVoice-1.5B
17
+
18
+ VibeVoice-1.5B is a text-to-speech (TTS) model hosted on Hugging Face. This repository provides scripts and examples to synthesize speech from text using pre-trained checkpoints.
19
+
20
+ ## Repository
21
+
22
+ Hugging Face model page: [technicalheist/vibevoice-1.5b](https://huggingface.co/technicalheist/vibevoice-1.5b/)
23
+
24
+ ## Requirements
25
+
26
+ * Python 3.8+
27
+ * PyTorch (with CUDA support recommended)
28
+ * [Transformers](https://github.com/huggingface/transformers)
29
+ * FFmpeg (for audio processing)
30
+
31
+ ## Installation
32
+
33
+ Clone the repository and install dependencies:
34
+
35
+ ```bash
36
+ # Clone the repository
37
+ !git clone https://huggingface.co/technicalheist/vibevoice-1.5b
38
+
39
+ # Change directory
40
+ %cd /content/vibevoice-1.5b
41
+
42
+ # Install in editable mode
43
+ !pip install -e .
44
+
45
+ # Install ffmpeg for audio handling
46
+ !apt update && apt install ffmpeg -y
47
+ ```
48
+
49
+ ## Usage
50
+
51
+ Run inference using the provided demo script:
52
+
53
+ ```bash
54
+ !python /content/vibevoice-1.5b/demo/inference_from_file.py \
55
+ --model_path /content/vibevoice-1.5b \
56
+ --txt_path /content/vibevoice-1.5b/demo/text_examples/1p_abs.txt \
57
+ --speaker_names Alice
58
+ ```
59
+
60
+ ### Arguments
61
+
62
+ * `--model_path`: Path to the model directory (local or Hugging Face repo name).
63
+ * `--txt_path`: Path to a text file containing the input text.
64
+ * `--speaker_names`: Names of the speakers to be used for synthesis (multiple speakers supported).
65
+
66
+ ### Example with multiple speakers
67
+
68
+ ```bash
69
+ !python /content/vibevoice-1.5b/demo/inference_from_file.py \
70
+ --model_path /content/vibevoice-1.5b \
71
+ --txt_path /content/vibevoice-1.5b/demo/text_examples/2p_music.txt \
72
+ --speaker_names Alice Frank
73
+ ```
74
+
75
+ ## Google Colab Notebook
76
+
77
+ A ready-to-use Google Colab notebook is available for quick experimentation:
78
+
79
+ [Open in Colab](https://colab.research.google.com/drive/1KAswi0RLdXq-CouJDlzzXcD2K5XcySt1?usp=sharing)
80
+
81
+ ## Output
82
+
83
+ * Generated audio files will be saved in the output directory specified in the script.
84
+ * Default output format: `.wav`
85
+
86
+ ## License
87
+
88
+ Check the license terms on the [model page](https://huggingface.co/technicalheist/vibevoice-1.5b/) before use.
89
+