Description

This repository contains a pre-trained FastText model for the Zarma language. The model generates word embeddings for Zarma text, capturing semantic and contextual information for various NLP tasks.

Tasks

  • Word Embeddings: Generate vector representations for Zarma words.
  • Part-of-Speech (POS) Tagging: Provide features for POS tagging models.
  • Text Classification: Use embeddings for sentiment analysis or topic classification.
  • Semantic Similarity: Compute similarity between Zarma words or phrases.

Usage Examples

1. Word Embeddings

Load the FastText model to get word embeddings for Zarma text.

import fasttext

model = fasttext.load_model('zarma_fasttext.bin')

word = "ay"
embedding = model.get_word_vector(word)
print(f"Embedding for '{word}': {embedding[:5]}...")

2. Semantic Similarity

import fasttext
import numpy as np

model = fasttext.load_model('zarma_fasttext.bin')

word1 = "ay"
word2 = "ni"
vec1 = model.get_word_vector(word1)
vec2 = model.get_word_vector(word2)

similarity = np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2) + 1e-8)
print(f"Similarity between '{word1}' and '{word2}': {similarity:.4f}")

How to Use

Install FastText: pip install fasttext

Download zarma_fasttext.bin from this repository.

Use the code snippets above to integrate the model into your NLP pipeline.

How to cite

If you use this model in your work, please cite:

@misc{zarma_fasttext,
  title     = {Pre-trained FastText Embeddings for Zarma},
  author    = {Mamadou K. Keita and Christopher Homan},
  year      = {2025},
  howpublished = {\url{https://huggingface.co/27Group/zarma_fasttext}}
}
Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train 27Group/zarma-fasttext