File size: 2,074 Bytes
97f5b26 9cf82d1 97f5b26 9cf82d1 97f5b26 9cf82d1 97f5b26 9cf82d1 97f5b26 9cf82d1 97f5b26 9cf82d1 97f5b26 9cf82d1 97f5b26 9cf82d1 97f5b26 9cf82d1 97f5b26 9cf82d1 97f5b26 9cf82d1 97f5b26 9cf82d1 97f5b26 9cf82d1 97f5b26 9cf82d1 97f5b26 9cf82d1 5c1d806 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
---
library_name: transformers
tags:
- dhivehi-word-parallel
license: mit
datasets:
- google/smol
language:
- dv
base_model:
- google/mt5-base
---
# MT5 Dhivehi (gatitos__en_dv fine-tuned)
This model is a fine-tuned version of [`google/mt5-small`](https://huggingface.co/google/mt5-small) on the [Google `smol` `gatitos__en_dv`](https://huggingface.co/datasets/google/smol) dataset.
> ⚠️ This is **not a general-purpose translator**. This finetune is to test MT5 usage on dhivehi. It is not intended for any other use.
## Model Summary
- **Base model**: `google/mt5-small`
- **Task**: Translation (English → Dhivehi)
- **Domain**: Unknown, This is an experimental finetune, so try words or short phrases only.
- **Dataset**: `google/smol` → `gatitos__en_dv`
- **Training framework**: Hugging Face Transformers
- **Loss target**: ~0.01
## Training Details
| Parameter | Value |
|------------------------|--------------|
| Epochs | 90 |
| Batch size | 4 |
| Learning rate | 5e-5 (constant) |
| Final train loss | 0.3797 |
| Gradient norm (last) | 15.72 |
| Total steps | 89,460 |
| Samples/sec | ~14.24 |
| FLOPs | 2.36e+16 |
- Training time: ~6.98 hours (25,117 seconds)
- Optimizer: AdamW
- Scheduler: Constant (no decay)
- Logging: Weights & Biases
## Example Usage (Gradio)
```python
from transformers import MT5ForConditionalGeneration, T5Tokenizer
model = MT5ForConditionalGeneration.from_pretrained("alakxender/mt5-dhivehi-word-parallel")
tokenizer = T5Tokenizer.from_pretrained("alakxender/mt5-dhivehi-word-parallel")
text = "translate English to Dhivehi: Hello, how are you?"
inputs = tokenizer(text, return_tensors="pt")
output = model.generate(**inputs, max_length=64)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```
## Intended Use
This model is meant for:
- Research in low-resource translation
- Experimentation with Dhivehi-language modeling
- Exprementation on the tokenizer |