|
--- |
|
library_name: transformers |
|
base_model: facebook/mms-tts |
|
tags: |
|
- text-to-speech |
|
- vits |
|
- mms |
|
- multilingual |
|
- Open-Source |
|
- Mali |
|
- Bambara |
|
language: |
|
- bm |
|
language_bcp47: |
|
- bm-ML |
|
model-index: |
|
- name: bambara-tts |
|
results: |
|
- task: |
|
name: text-to-speech |
|
type: speech-synthesis |
|
metrics: |
|
- name: Subjective Quality |
|
type: MOS |
|
value: "N/A" |
|
pipeline_tag: text-to-speech |
|
license: cc-by-nc-4.0 |
|
--- |
|
|
|
# Bambara TTS |
|
|
|
Text-to-speech synthesis model for Bambara (Bamanankan), a language spoken by over 14 million people primarily in Mali. |
|
|
|
## Technical Specifications |
|
|
|
- **Architecture**: VITS (Variational Inference with adversarial learning for end-to-end TTS) |
|
- **Base Model**: Facebook/Meta MMS |
|
- **Size**: 145 MB |
|
- **Format**: PyTorch |
|
- **Sampling Rate**: 16kHz |
|
- **Language**: Bambara (bm-ML) |
|
- **Performance**: Optimized for CPU (4GB RAM recommended) |
|
|
|
## Installation |
|
|
|
``` |
|
pip install transformers torch soundfile |
|
``` |
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import VitsModel, AutoTokenizer |
|
import torch |
|
|
|
# Load model and tokenizer |
|
model = VitsModel.from_pretrained("sudoping01/bambara-tts") |
|
tokenizer = AutoTokenizer.from_pretrained("sudoping01/bambara-tts") |
|
|
|
# Prepare text and generate speech |
|
text = "An filɛ ni ye yɔrɔ minna ni an ye an sigi ka a layɛ yala an bɛ ka baara min kɛ ɛsike a kɛlen don ka Ɲɛ wa ?" |
|
inputs = tokenizer(text, return_tensors="pt") |
|
with torch.no_grad(): |
|
output = model(**inputs).waveform |
|
|
|
# Save output |
|
waveform = output.squeeze().cpu().numpy() |
|
sample_rate = model.config.sampling_rate |
|
import soundfile as sf |
|
sf.write("bambara_output.wav", waveform, sample_rate) |
|
``` |
|
|
|
## Limitations |
|
|
|
- Limited handling of loanwords and code-switching with French |
|
- Variable performance across regional dialects |
|
- Requires standard orthography |
|
- Limited prosody and emotional expression |
|
|
|
## License |
|
|
|
CC BY-NC 4.0 (Attribution-NonCommercial) |
|
|
|
- Non-commercial use only |
|
- Attribution required for model authors and Meta |
|
- Use must respect Bambara language and culture |
|
|
|
## References |
|
|
|
```bibtex |
|
@misc{bambara-tts, |
|
author = {sudoping01}, |
|
title = {Text-to-Speech Model for Bambara}, |
|
year = {2025}, |
|
publisher = {HuggingFace}, |
|
howpublished = {\url{https://huggingface.co/sudoping01/bambara-tts}} |
|
} |
|
``` |