File size: 2,074 Bytes

97f5b26
 
9cf82d1
 
 
 
 
 
 
 
 
97f5b26
9cf82d1
97f5b26
9cf82d1
97f5b26
9cf82d1
97f5b26
9cf82d1
97f5b26
9cf82d1
 
 
 
 
 
97f5b26
 
 
9cf82d1
 
 
 
 
 
 
 
 
 
97f5b26
9cf82d1
 
 
 
97f5b26
 
9cf82d1
97f5b26
9cf82d1
 
97f5b26
9cf82d1
 
97f5b26
9cf82d1
 
 
 
 
97f5b26
9cf82d1
97f5b26
9cf82d1
97f5b26
9cf82d1
 
5c1d806

---
library_name: transformers
tags:
- dhivehi-word-parallel
license: mit
datasets:
- google/smol
language:
- dv
base_model:
- google/mt5-base
---
# MT5 Dhivehi (gatitos__en_dv fine-tuned)

This model is a fine-tuned version of [`google/mt5-small`](https://huggingface.co/google/mt5-small) on the [Google `smol` `gatitos__en_dv`](https://huggingface.co/datasets/google/smol) dataset.

> ⚠️ This is **not a general-purpose translator**. This finetune is to test MT5 usage on dhivehi. It is not intended for any other use.

## Model Summary

- **Base model**: `google/mt5-small`
- **Task**: Translation (English → Dhivehi)
- **Domain**: Unknown, This is an experimental finetune, so try words or short phrases only.
- **Dataset**: `google/smol` → `gatitos__en_dv`
- **Training framework**: Hugging Face Transformers
- **Loss target**: ~0.01

## Training Details

| Parameter              | Value        |
|------------------------|--------------|
| Epochs                | 90           |
| Batch size            | 4            |
| Learning rate         | 5e-5 (constant) |
| Final train loss      | 0.3797       |
| Gradient norm (last)  | 15.72        |
| Total steps           | 89,460       |
| Samples/sec           | ~14.24       |
| FLOPs                 | 2.36e+16     |

- Training time: ~6.98 hours (25,117 seconds)
- Optimizer: AdamW
- Scheduler: Constant (no decay)
- Logging: Weights & Biases


## Example Usage (Gradio)

```python
from transformers import MT5ForConditionalGeneration, T5Tokenizer

model = MT5ForConditionalGeneration.from_pretrained("alakxender/mt5-dhivehi-word-parallel")
tokenizer = T5Tokenizer.from_pretrained("alakxender/mt5-dhivehi-word-parallel")

text = "translate English to Dhivehi: Hello, how are you?"
inputs = tokenizer(text, return_tensors="pt")
output = model.generate(**inputs, max_length=64)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```

## Intended Use

This model is meant for:

- Research in low-resource translation
- Experimentation with Dhivehi-language modeling
- Exprementation on the tokenizer