Models trained on IPA-CHILDES and evaluated for phonological knowledge using the word segmentation task, linked to child language acquisition.
Language Modeling with Phonemes
AI & ML interests
tokenization, CHILDES, word segmentation, phonemes, BabyLM
Recent Activity
View all activity
Organization Card
Edit this README.md
markdown file to author your organization card.
Collections
3
The IPA-CHILDES dataset along with the models and tokenizers used for phoneme-based language modeling for the 31 languages in CHILDES.
-
IPA-CHILDES & G2P+: Feature-Rich Resources for Cross-Lingual Phonology and Phonemic Language Modeling
Paper • 2504.03036 • Published -
phonemetransformers/IPA-CHILDES
Viewer • Updated • 12.5M • 775 • 1 -
phonemetransformers/ipa-childes-tokenizers
Updated -
phonemetransformers/ipa-childes-models
Updated
models
36
phonemetransformers/ipa-childes-models-tiny
Updated
phonemetransformers/ipa-childes-models-small
Updated
phonemetransformers/ipa-childes-models-medium
Updated
phonemetransformers/ipa-childes-models-large
Updated
phonemetransformers/ipa-childes-tokenizers
Updated
phonemetransformers/ipa-childes-english-size-comparison
Updated
•
160
phonemetransformers/ipa-childes-models
Updated
phonemetransformers/babble-tokenizers
Updated
phonemetransformers/childes-phoneme-tokenizers
Updated
phonemetransformers/GPT2-85M-BPE-TXT
Updated
•
4.3k