Pocket Polyglot Mzansi 50M (4 languages)
Pocket Polyglot Mzansi
is a small 50M parameter machine translation model for South African languages. The model is part of an ongoing research project that aims to develop a small (<50M parameters) machine translation model that matches or exceeds the accuracy of NLLN-200-600M on South African languages. The current version of the model is > 90% smaller than NLLB-200-600M, but sacrifices only 6.3% in accuracy in terms of chrF++.
Model Details
Model Description
- Developed by: Stefan Strydom
- Model type: Small 50M parameter translation model for four South African languages built using the architecture from NLLB-200.
- Language(s) (NLP): - Afrikaans (afr_Latn), English (eng_Latn), isiXhosa (xho_Latn), isiZulu (zul_Latn)
- License: CC BY-NC 4.0.
Model Sources [optional]
- Repository: Coming soon
- Paper: Deep Learning IndabaX South Africa 2025 slides
- Demo: Demo app | Demo repo
Intended use
Pocket Polyglot Mzansi
is a research model. The intended use is deployment on edge devices for offline machine translation. It allows for single sentence translation among four languages.
How to Get Started with the Model
Use the code below to get started with the model.
>>> from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
>>> tokenizer = AutoTokenizer.from_pretrained("stefan7/pocket_polyglot_mzansi_50M_4langs")
>>> model = AutoModelForSeq2SeqLM.from_pretrained("stefan7/pocket_polyglot_mzansi_50M_4langs")
>>> tokenizer.src_lang = "eng_Latn"
>>> text = "How was your day?"
>>> inputs = tokenizer(text, return_tensors="pt")
>>> translated_tokens = model.generate(
... **inputs, forced_bos_token_id=tokenizer.convert_tokens_to_ids("xho_Latn")
... )
>>> tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]
Wawunjani umhla wakho?
Training Details
Training Data
The model was trained on data from WMT22-African
Training Procedure
- Batch size of 128 sentences
- Trained on 30M sentences (230,000 update steps)
- 1cycle policy scheduler with:
- two phases
- max_lr = 1e-3
- pct_start = 0.25
- anneal_strategy = 'cos'
- div_factor = 25.0
- final_div_factor = 1e5
- Adam optimizer with mom=0.9, sqr_mom=0.98, eps=1e-6
- No dropout or weight decay (not considered/tuned yet for this work)
Evaluation
Testing Data
Tested on Flores200 devtest split
Metrics
Following the approach used by the NLLB-200 project, the model was evaluated using spBLEU and chrF++, metrics widely adopted by machine translation community.
Results
Results for the original model translating four South African languages (12 translation directions):
Our 50M model | NLLB-200-600M | % difference | |
---|---|---|---|
Number of parameters | 49,260,544 | 615,073,792 | -92.0% |
Memory footprint in 16-bit (GB) | 0.09 | 1.15 | -91.9% |
chrF++v | 48.8 | 52.1 | -6.3% |
spBLEU | 25.1 | 29.5 | -14.7% |
chrF++ scores by language direction (all 12 directions for the original four languages):
Source language | Target language | Our 50M model | NLLB-200 600M | Difference |
---|---|---|---|---|
isiXhosa | isiZulu | 44.3 | 45.5 | -1.3 |
isiXhosa | Afrikaans | 43.0 | 46.1 | -3.0 |
isiXhosa | English | 48.3 | 55.7 | -7.5 |
isiZulu | isiXhosa | 42.9 | 42.6 | 0.3 |
isiZulu | Afrikaans | 44.5 | 47.1 | -2.5 |
isiZulu | English | 49.3 | 57.3 | -8.0 |
Afrikaans | isiXhosa | 41.6 | 44.3 | -2.8 |
Afrikaans | isiZulu | 44.9 | 47.6 | -2.7 |
Afrikaans | English | 65.9 | 73.5 | -7.7 |
English | isiXhosa | 46.2 | 47.3 | -1.2 |
English | isiZulu | 49.5 | 51.6 | -2.1 |
English | Afrikaans | 62.1 | 63.3 | -1.2 |
Compute Infrastructure & Environmental Impact
- All experiments ran on a single NVIDIA A5000 (24GB) or A6000 (48GB) GPU
- Total training time for a single model: 10 hours on A5000 ($4.40 using Jarvis Labs instances @ $0.44/hour)
- Estimated carbon emissions for a single training run: 1.43kg CO2eq (estimated using Machine Learning Impact calculator presented in Lacoste et al. (2019))
- Downloads last month
- 5