CHILDES IPA Tokenizers

Tokenizers for each language in IPA-CHILDES used to train cross-lingual phoneme LLMs in our papers:

Scripts for creating the tokenizers can be found here. Scripts for training models using these tokenizers can be found here.

To load a tokenizer:

from transformers import AutoTokenizer
dutch_tokenizer = AutoTokenizer.from_pretrained('phonemetransformers/ipa-childes-tokenizers', subfolder='Dutch')
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train phonemetransformers/ipa-childes-tokenizers

Collections including phonemetransformers/ipa-childes-tokenizers