Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,89 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
tags:
|
4 |
+
- text-to-speech
|
5 |
+
- audio
|
6 |
+
- speech
|
7 |
+
language:
|
8 |
+
- en
|
9 |
+
pipeline_tag: text-to-speech
|
10 |
+
model-index:
|
11 |
+
- name: VibeVoice-1.5B
|
12 |
+
results: []
|
13 |
+
---
|
14 |
+
|
15 |
+
|
16 |
+
# VibeVoice-1.5B
|
17 |
+
|
18 |
+
VibeVoice-1.5B is a text-to-speech (TTS) model hosted on Hugging Face. This repository provides scripts and examples to synthesize speech from text using pre-trained checkpoints.
|
19 |
+
|
20 |
+
## Repository
|
21 |
+
|
22 |
+
Hugging Face model page: [technicalheist/vibevoice-1.5b](https://huggingface.co/technicalheist/vibevoice-1.5b/)
|
23 |
+
|
24 |
+
## Requirements
|
25 |
+
|
26 |
+
* Python 3.8+
|
27 |
+
* PyTorch (with CUDA support recommended)
|
28 |
+
* [Transformers](https://github.com/huggingface/transformers)
|
29 |
+
* FFmpeg (for audio processing)
|
30 |
+
|
31 |
+
## Installation
|
32 |
+
|
33 |
+
Clone the repository and install dependencies:
|
34 |
+
|
35 |
+
```bash
|
36 |
+
# Clone the repository
|
37 |
+
!git clone https://huggingface.co/technicalheist/vibevoice-1.5b
|
38 |
+
|
39 |
+
# Change directory
|
40 |
+
%cd /content/vibevoice-1.5b
|
41 |
+
|
42 |
+
# Install in editable mode
|
43 |
+
!pip install -e .
|
44 |
+
|
45 |
+
# Install ffmpeg for audio handling
|
46 |
+
!apt update && apt install ffmpeg -y
|
47 |
+
```
|
48 |
+
|
49 |
+
## Usage
|
50 |
+
|
51 |
+
Run inference using the provided demo script:
|
52 |
+
|
53 |
+
```bash
|
54 |
+
!python /content/vibevoice-1.5b/demo/inference_from_file.py \
|
55 |
+
--model_path /content/vibevoice-1.5b \
|
56 |
+
--txt_path /content/vibevoice-1.5b/demo/text_examples/1p_abs.txt \
|
57 |
+
--speaker_names Alice
|
58 |
+
```
|
59 |
+
|
60 |
+
### Arguments
|
61 |
+
|
62 |
+
* `--model_path`: Path to the model directory (local or Hugging Face repo name).
|
63 |
+
* `--txt_path`: Path to a text file containing the input text.
|
64 |
+
* `--speaker_names`: Names of the speakers to be used for synthesis (multiple speakers supported).
|
65 |
+
|
66 |
+
### Example with multiple speakers
|
67 |
+
|
68 |
+
```bash
|
69 |
+
!python /content/vibevoice-1.5b/demo/inference_from_file.py \
|
70 |
+
--model_path /content/vibevoice-1.5b \
|
71 |
+
--txt_path /content/vibevoice-1.5b/demo/text_examples/2p_music.txt \
|
72 |
+
--speaker_names Alice Frank
|
73 |
+
```
|
74 |
+
|
75 |
+
## Google Colab Notebook
|
76 |
+
|
77 |
+
A ready-to-use Google Colab notebook is available for quick experimentation:
|
78 |
+
|
79 |
+
[Open in Colab](https://colab.research.google.com/drive/1KAswi0RLdXq-CouJDlzzXcD2K5XcySt1?usp=sharing)
|
80 |
+
|
81 |
+
## Output
|
82 |
+
|
83 |
+
* Generated audio files will be saved in the output directory specified in the script.
|
84 |
+
* Default output format: `.wav`
|
85 |
+
|
86 |
+
## License
|
87 |
+
|
88 |
+
Check the license terms on the [model page](https://huggingface.co/technicalheist/vibevoice-1.5b/) before use.
|
89 |
+
|