t5-v1_1-small pretrained with mlm task on

• kbd (custom latin script) 835K lines: a pile of scraped text from news sites, books etc.

• ru 3M lines: wiki corpus from OPUS

tokenizer: sentencepiece unigram, 8K, shared vocabulary

Downloads last month
12
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Dataset used to train anzorq/kbd_lat-835k_ru-3M_t5-small