alakxender's picture
Update README.md
5c1d806 verified
---
library_name: transformers
tags:
- dhivehi-word-parallel
license: mit
datasets:
- google/smol
language:
- dv
base_model:
- google/mt5-base
---
# MT5 Dhivehi (gatitos__en_dv fine-tuned)
This model is a fine-tuned version of [`google/mt5-small`](https://huggingface.co/google/mt5-small) on the [Google `smol` `gatitos__en_dv`](https://huggingface.co/datasets/google/smol) dataset.
> ⚠️ This is **not a general-purpose translator**. This finetune is to test MT5 usage on dhivehi. It is not intended for any other use.
## Model Summary
- **Base model**: `google/mt5-small`
- **Task**: Translation (English → Dhivehi)
- **Domain**: Unknown, This is an experimental finetune, so try words or short phrases only.
- **Dataset**: `google/smol` → `gatitos__en_dv`
- **Training framework**: Hugging Face Transformers
- **Loss target**: ~0.01
## Training Details
| Parameter | Value |
|------------------------|--------------|
| Epochs | 90 |
| Batch size | 4 |
| Learning rate | 5e-5 (constant) |
| Final train loss | 0.3797 |
| Gradient norm (last) | 15.72 |
| Total steps | 89,460 |
| Samples/sec | ~14.24 |
| FLOPs | 2.36e+16 |
- Training time: ~6.98 hours (25,117 seconds)
- Optimizer: AdamW
- Scheduler: Constant (no decay)
- Logging: Weights & Biases
## Example Usage (Gradio)
```python
from transformers import MT5ForConditionalGeneration, T5Tokenizer
model = MT5ForConditionalGeneration.from_pretrained("alakxender/mt5-dhivehi-word-parallel")
tokenizer = T5Tokenizer.from_pretrained("alakxender/mt5-dhivehi-word-parallel")
text = "translate English to Dhivehi: Hello, how are you?"
inputs = tokenizer(text, return_tensors="pt")
output = model.generate(**inputs, max_length=64)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```
## Intended Use
This model is meant for:
- Research in low-resource translation
- Experimentation with Dhivehi-language modeling
- Exprementation on the tokenizer