YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

VibeVoice 1.5B Single-Speaker Fine-tuning Guide

This folder contains all the files needed to fine-tune VibeVoice 1.5B for a single speaker (Elise voice).

Key Improvements

Fixed EOS Token Issue: The modified data_vibevoice.py adds proper <|endoftext|> token after speech generation to prevent repetition/looping
Single-Speaker Training: Uses voice_prompt_drop_rate=1.0 to train without voice prompts
Audio Quality Filter: Removes training samples with abrupt cutoffs

data_vibevoice.py - CRITICAL: Modified data collator that adds EOS token (replaces src/data_vibevoice.py)
prepare_jinsaryko_elise_dataset.py - Downloads and prepares the Elise dataset
detect_audio_cutoffs.py - Detects audio files with abrupt endings
finetune_elise_single_speaker.sh - Training script for single-speaker model
test_fixed_eos_dummy_voice.py - Test script for inference

Prepare the dataset:

python prepare_jinsaryko_elise_dataset.py

Detect and remove bad audio (optional but recommended):

python detect_audio_cutoffs.py
# This will create elise_cleaned/ folder with good samples only

IMPORTANT: Replace the data collator:

cp data_vibevoice.py ../src/data_vibevoice.py

Key settings in finetune_elise_single_speaker.sh:

The training data should be JSONL with this format:

{"text": "Speaker 0: Hello, this is a test.", "audio": "/path/to/audio.wav"}

Note: The "Speaker 0:" prefix is REQUIRED for all text entries.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support