ArabovMK
/

tajik-fasttext-model

+---
+language: tg
+license: mit
+tags:
+- fasttext
+- tajik
+- word-embeddings
+- nlp
+---
+# Tajik FastText Word Embedding Model
+This repository contains a pretrained **FastText** model for the Tajik language.
+- **Training data**: Tokenized Tajik corpus
+- **Total tokens**: 21,171,522
+- **Vocabulary size**: 316,637
+- **Model type**: FastText (with subword information)
+## Files Included
+| File | Description |
+|------|-------------|
+| `tajik_fasttext.model` | Gensim model file |
+| `tajik_fasttext.model.wv.vectors_ngrams.npy` | Subword (n-gram) vectors |
+| `tajik_fasttext.model.wv.vectors_vocab.npy` | Word vectors |
+All three files are required to load the model correctly using Gensim.
+## Usage
+```python
+from gensim.models import FastText
+model = FastText.load("tajik_fasttext.model")
+vector = model.wv["Точикистон"]  # Example word
+```
+## Citation
+If you use this model, please cite the repository:
+> ArabovMK, Tajik FastText Model, Hugging Face, 2025-05-08