@hexgrad on Hugging Face: "Wanted: Peak Data. I'm collecting audio data to train another TTS model: + AVM…"

hexgrad

posted an update Feb 7

Post

7112

Wanted: Peak Data. I'm collecting audio data to train another TTS model:
+ AVM data: ChatGPT Advanced Voice Mode audio & text from source
+ Professional audio: Permissive (CC0, Apache, MIT, CC-BY)

This audio should *impress* most native speakers, not just barely pass their audio Turing tests. Professional-caliber means S or A-tier, not your average bloke off the street. Traditional TTS may not make the cut. Absolutely no low-fi microphone recordings like Common Voice.

The bar is much higher than last time, so there are no timelines yet and I expect it may take longer to collect such mythical data. Raising the bar means evicting quite a bit of old data, and voice/language availability may decrease. The theme is *quality* over quantity. I would rather have 1 hour of A/S-tier than 100 hours of mid data.

I have nothing to offer but the north star of a future Apache 2.0 TTS model, so prefer data that you *already have* and costs you *nothing extra* to send. Additionally, *all* the new data may be used to construct public, Apache 2.0 voicepacks, and if that arrangement doesn't work for you, no need to send any audio.

Last time I asked for horses; now I'm asking for unicorns. As of writing this post, I've currently got a few English & Chinese unicorns, but there is plenty of room in the stable. Find me over on Discord at rzvzn: https://discord.gg/QuGxSWBfQy

Siddharth

Feb 12

Hi Where do I send this?

holooo

Feb 15

He also mentioned the link to Discord at the bottom.

nxym

Feb 19

Hello, I have a bunch of high-quality data from the production of a video of voices that belong to my rights from male and female on the Greek language and which have incredible access to create a small clone model of male voice from this set with only 10 minutes of audio. But I have a ton of hours of these sets for professional training. My language is complicated, and it was a surprising result. Most models out there use one robotic set and have bad datasets to create Greek human voice. Only eleven labs have better sets and open. I'll be glad to help your project because it is for sure the most promising out-of-the-box natural and fast processing on the fly real-time voice system, even in systems with low resources.

Join the conversation