ZamAI-Translator-Pashto-EN

Model Description

Specialized translation model for accurate Pashto-English and English-Pashto translation.

This model is part of the ZamAI (زمای) project - an advanced Afghan AI assistant designed to understand and communicate in Pashto, English, and other Afghan languages.

Key Features

  • Bidirectional translation
  • Cultural context preservation
  • Technical term handling
  • High accuracy BLEU scores
  • Domain adaptation capabilities

Use Cases

  • Document translation
  • Real-time conversation translation
  • Educational content translation
  • Business communication
  • Literature translation

Model Architecture

  • Base Model: facebook/nllb-200-3.3B
  • Architecture: nllb
  • Task: translation
  • Languages: Pashto (ps), English (en)

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model_name = "tasal9/zamai-translator-pashto-en"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Generate text
prompt = "سلام! زه د افغانستان په اړه پوښتنه لرم:"
inputs = tokenizer.encode(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        inputs,
        max_length=200,
        temperature=0.8,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Details

  • Dataset: ZamAI Pashto Dataset (tasal9/ZamAI_Pashto_Dataset)
  • Training Method: Fine-tuning on parallel corpora
  • Epochs: 5
  • Batch Size: 4
  • Learning Rate: 3e-5

Performance

The model has been trained on conversational Pashto data and shows strong performance in:

  • Natural conversation flow
  • Cultural context understanding
  • Mixed language handling (Code-switching)
  • Afghan cultural knowledge

Limitations

  • Primary focus on Pashto and English
  • May require further fine-tuning for specific domains
  • Performance may vary with complex technical terminology

Ethical Considerations

This model is designed to respect Afghan and Islamic values, promoting positive and constructive conversations while avoiding harmful or inappropriate content.

Citation

@misc{zamai_zamai_translator_pashto_en_2024,
  title={ZamAI ZamAI-Translator-Pashto-EN: Advanced Pashto Language Model},
  author={ZamAI Team},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/tasal9/zamai-translator-pashto-en}
}

Contact

For questions, suggestions, or collaboration opportunities, please reach out through the ZamAI project.


Built with ❤️ for the Afghan community

Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train tasal9/ZamAI-Pashto-Translator-FacebookNLB-ps-en