ArabovMK commited on
Commit
94935a7
·
verified ·
1 Parent(s): 5837fd5

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +43 -0
README.md ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: tg
3
+ license: mit
4
+ tags:
5
+ - fasttext
6
+ - tajik
7
+ - word-embeddings
8
+ - nlp
9
+ ---
10
+
11
+ # Tajik FastText Word Embedding Model
12
+
13
+ This repository contains a pretrained **FastText** model for the Tajik language.
14
+
15
+ - **Training data**: Tokenized Tajik corpus
16
+ - **Total tokens**: 21,171,522
17
+ - **Vocabulary size**: 316,637
18
+ - **Model type**: FastText (with subword information)
19
+
20
+ ## Files Included
21
+
22
+ | File | Description |
23
+ |------|-------------|
24
+ | `tajik_fasttext.model` | Gensim model file |
25
+ | `tajik_fasttext.model.wv.vectors_ngrams.npy` | Subword (n-gram) vectors |
26
+ | `tajik_fasttext.model.wv.vectors_vocab.npy` | Word vectors |
27
+
28
+ All three files are required to load the model correctly using Gensim.
29
+
30
+ ## Usage
31
+
32
+ ```python
33
+ from gensim.models import FastText
34
+
35
+ model = FastText.load("tajik_fasttext.model")
36
+ vector = model.wv["Точикистон"] # Example word
37
+ ```
38
+
39
+ ## Citation
40
+
41
+ If you use this model, please cite the repository:
42
+
43
+ > ArabovMK, Tajik FastText Model, Hugging Face, 2025-05-08