Permissively licensed transcription to International Phonetic Alphabet (IPA) in Python

#50

by davidmezzetti - opened 2 days ago

2 days ago

First off, this is such a great project, congratulations!

I'm working to integrate this model as the primary TTS model for txtai. I notice that tokenization is handled by espeak-ng, which is GPL licensed. Additionally, it seems like installing espeak isn't straightforward for everyone.

With that, I've added support to ttstokenizer for transcribing text to the International Phonetic Alphabet. This is a drop-in replacement for espeak (for English) and generates token ids that can be consumed by this model.

alien79

1 day ago

a provider implementation where you can chose the phonemizer would be a good solution, but I won't drop espeak from the library because it's the only phonemizer actually usable for multilanguage purpose that may fit lot of people needs

davidmezzetti

1 day ago

That's your call. If the eSpeak GPL license works for you, that's great. If you're building commercial software and don't intend to open source your work, then it could be problematic.

hexgrad

Owner 1 day ago

Hi, I'm aware of the GPL-ness of espeak-ng. More importantly, the performance can be lacking sometimes.

To that end, the next version of the model will use https://github.com/hexgrad/misaki for English: a simple, dictionary-first + fallback approach to G2P.

The fallback there is still espeak-ng, but one of the TODOs there is:

Fallbacks: Train seq2seq fallback models on dictionaries using this notebook.

You can also find a demo here https://hf.co/spaces/hexgrad/Misaki-G2P and any gold/silver (or diamond) is NOT hitting espeak fallback.

In general, the G2P method seems relatively flexible and can be airdropped into a model, as long as you continue training on the new G2P scheme.

davidmezzetti

1 day ago

@hexgrad Glad to hear it!

I'll keep an eye on this project. ttstokenizer that I mentioned is a fork of g2p_en which has the model with the notebook you mentioned. It might be of some use to you.

Ultimately, someone has to do the work to collect a dictionary dataset for languages then build an out of vocab model. Sounds like you're signing up for the task!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment