|
|
--- |
|
|
library_name: transformers |
|
|
tags: |
|
|
- dhivehi-word-parallel |
|
|
license: mit |
|
|
datasets: |
|
|
- google/smol |
|
|
language: |
|
|
- dv |
|
|
base_model: |
|
|
- google/mt5-base |
|
|
--- |
|
|
# MT5 Dhivehi (gatitos__en_dv fine-tuned) |
|
|
|
|
|
This model is a fine-tuned version of [`google/mt5-small`](https://huggingface.co/google/mt5-small) on the [Google `smol` `gatitos__en_dv`](https://huggingface.co/datasets/google/smol) dataset. |
|
|
|
|
|
> ⚠️ This is **not a general-purpose translator**. This finetune is to test MT5 usage on dhivehi. It is not intended for any other use. |
|
|
|
|
|
## Model Summary |
|
|
|
|
|
- **Base model**: `google/mt5-small` |
|
|
- **Task**: Translation (English → Dhivehi) |
|
|
- **Domain**: Unknown, This is an experimental finetune, so try words or short phrases only. |
|
|
- **Dataset**: `google/smol` → `gatitos__en_dv` |
|
|
- **Training framework**: Hugging Face Transformers |
|
|
- **Loss target**: ~0.01 |
|
|
|
|
|
## Training Details |
|
|
|
|
|
| Parameter | Value | |
|
|
|------------------------|--------------| |
|
|
| Epochs | 90 | |
|
|
| Batch size | 4 | |
|
|
| Learning rate | 5e-5 (constant) | |
|
|
| Final train loss | 0.3797 | |
|
|
| Gradient norm (last) | 15.72 | |
|
|
| Total steps | 89,460 | |
|
|
| Samples/sec | ~14.24 | |
|
|
| FLOPs | 2.36e+16 | |
|
|
|
|
|
- Training time: ~6.98 hours (25,117 seconds) |
|
|
- Optimizer: AdamW |
|
|
- Scheduler: Constant (no decay) |
|
|
- Logging: Weights & Biases |
|
|
|
|
|
|
|
|
## Example Usage (Gradio) |
|
|
|
|
|
```python |
|
|
from transformers import MT5ForConditionalGeneration, T5Tokenizer |
|
|
|
|
|
model = MT5ForConditionalGeneration.from_pretrained("alakxender/mt5-dhivehi-word-parallel") |
|
|
tokenizer = T5Tokenizer.from_pretrained("alakxender/mt5-dhivehi-word-parallel") |
|
|
|
|
|
text = "translate English to Dhivehi: Hello, how are you?" |
|
|
inputs = tokenizer(text, return_tensors="pt") |
|
|
output = model.generate(**inputs, max_length=64) |
|
|
print(tokenizer.decode(output[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
This model is meant for: |
|
|
|
|
|
- Research in low-resource translation |
|
|
- Experimentation with Dhivehi-language modeling |
|
|
- Exprementation on the tokenizer |