technicalheist
/

vibevoice-1.5b

Model card Files Files and versions

technicalheist commited on 6 days ago

Commit

3de0fd6

·

verified ·

1 Parent(s): 4889ed5

Update README.md

Files changed (1) hide show

README.md +89 -3

README.md CHANGED Viewed

@@ -1,3 +1,89 @@
----
-license: mit
----

+---
+license: mit
+tags:
+- text-to-speech
+- audio
+- speech
+language:
+- en
+pipeline_tag: text-to-speech
+model-index:
+- name: VibeVoice-1.5B
+  results: []
+---
+# VibeVoice-1.5B
+VibeVoice-1.5B is a text-to-speech (TTS) model hosted on Hugging Face. This repository provides scripts and examples to synthesize speech from text using pre-trained checkpoints.
+## Repository
+Hugging Face model page: [technicalheist/vibevoice-1.5b](https://huggingface.co/technicalheist/vibevoice-1.5b/)
+## Requirements
+* Python 3.8+
+* PyTorch (with CUDA support recommended)
+* [Transformers](https://github.com/huggingface/transformers)
+* FFmpeg (for audio processing)
+## Installation
+Clone the repository and install dependencies:
+```bash
+# Clone the repository
+!git clone https://huggingface.co/technicalheist/vibevoice-1.5b
+# Change directory
+%cd /content/vibevoice-1.5b
+# Install in editable mode
+!pip install -e .
+# Install ffmpeg for audio handling
+!apt update && apt install ffmpeg -y
+```
+## Usage
+Run inference using the provided demo script:
+```bash
+!python /content/vibevoice-1.5b/demo/inference_from_file.py \
+  --model_path /content/vibevoice-1.5b \
+  --txt_path /content/vibevoice-1.5b/demo/text_examples/1p_abs.txt \
+  --speaker_names Alice
+```
+### Arguments
+* `--model_path`: Path to the model directory (local or Hugging Face repo name).
+* `--txt_path`: Path to a text file containing the input text.
+* `--speaker_names`: Names of the speakers to be used for synthesis (multiple speakers supported).
+### Example with multiple speakers
+```bash
+!python /content/vibevoice-1.5b/demo/inference_from_file.py \
+  --model_path /content/vibevoice-1.5b \
+  --txt_path /content/vibevoice-1.5b/demo/text_examples/2p_music.txt \
+  --speaker_names Alice Frank
+```
+## Google Colab Notebook
+A ready-to-use Google Colab notebook is available for quick experimentation:
+[Open in Colab](https://colab.research.google.com/drive/1KAswi0RLdXq-CouJDlzzXcD2K5XcySt1?usp=sharing)
+## Output
+* Generated audio files will be saved in the output directory specified in the script.
+* Default output format: `.wav`
+## License
+Check the license terms on the [model page](https://huggingface.co/technicalheist/vibevoice-1.5b/) before use.