Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

Tweeties in a Tweety World

community
Activity Feed

AI & ML interests

Multilingual and Low-Resource NLP

Pieter Delobelle's profile picture François Remy's profile picture Giuseppe Attanasio's profile picture Alfiya Khabibullina's profile picture Miryam de Lhoneux's profile picture Avetisyan's profile picture Jessa Bekker's profile picture
Organization Card
Community About org cards

The Tweeties is a series of foundation models incorporating native tokenizers for each language, for a better understanding and generation of text in these languages. These models are adapted from existing models using trans-tokenization, and further pre-trained on existing corpora.

Collections 1

Papers on Trans-Tokenization
  • Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP

    Paper • 2408.04303 • Published Aug 8, 2024 • 23
Papers on Trans-Tokenization
  • Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP

    Paper • 2408.04303 • Published Aug 8, 2024 • 23

models 6

Tweeties/tweety-7b-dutch-v24a

Text Generation • 7B • Updated May 11 • 351 • 13

Tweeties/tweety-tatar-hydra-mt-7b-v24a

Text Generation • 7B • Updated Aug 9, 2024 • 16

Tweeties/tweety-tatar-hydra-base-7b-v24a

Text Generation • 7B • Updated Aug 9, 2024 • 22

Tweeties/tweety-7b-tatar-v24a

Text Generation • 7B • Updated Aug 9, 2024 • 7 • 11

Tweeties/tweety-7b-armenian-v24a

Text Generation • 7B • Updated May 27, 2024 • 5 • 1

Tweeties/tweety-7b-italian-v24a

Text Generation • 7B • Updated May 13, 2024 • 4 • 2

datasets 0

None public yet
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs