ArabovMK
/

tajik-fasttext-model

word-embeddings

Model card Files Files and versions

tajik-fasttext-model / README.md

ArabovMK's picture

Upload README.md with huggingface_hub

94935a7 verified 4 months ago

|

979 Bytes

	---
	language: tg
	license: mit
	tags:
	- fasttext
	- tajik
	- word-embeddings
	- nlp
	---

	# Tajik FastText Word Embedding Model

	This repository contains a pretrained FastText model for the Tajik language.

	- Training data: Tokenized Tajik corpus
	- Total tokens: 21,171,522
	- Vocabulary size: 316,637
	- Model type: FastText (with subword information)

	## Files Included

	\| File \| Description \|
	\|------\|-------------\|
	\| `tajik_fasttext.model` \| Gensim model file \|
	\| `tajik_fasttext.model.wv.vectors_ngrams.npy` \| Subword (n-gram) vectors \|
	\| `tajik_fasttext.model.wv.vectors_vocab.npy` \| Word vectors \|

	All three files are required to load the model correctly using Gensim.

	## Usage

	```python
	from gensim.models import FastText

	model = FastText.load("tajik_fasttext.model")
	vector = model.wv["Точикистон"] # Example word
	```

	## Citation

	If you use this model, please cite the repository:

	> ArabovMK, Tajik FastText Model, Hugging Face, 2025-05-08