alakxender
/

mt5-dhivehi-word-parallel

text2text-generation

dhivehi-word-parallel

Model card Files Files and versions

mt5-dhivehi-word-parallel / README.md

alakxender's picture

Update README.md

5c1d806 verified 3 months ago

|

history blame contribute delete

2.07 kB

	---
	library_name: transformers
	tags:
	- dhivehi-word-parallel
	license: mit
	datasets:
	- google/smol
	language:
	- dv
	base_model:
	- google/mt5-base
	---
	# MT5 Dhivehi (gatitos__en_dv fine-tuned)

	This model is a fine-tuned version of [`google/mt5-small`](https://huggingface.co/google/mt5-small) on the [Google `smol` `gatitos__en_dv`](https://huggingface.co/datasets/google/smol) dataset.

	> ⚠️ This is not a general-purpose translator. This finetune is to test MT5 usage on dhivehi. It is not intended for any other use.

	## Model Summary

	- Base model: `google/mt5-small`
	- Task: Translation (English → Dhivehi)
	- Domain: Unknown, This is an experimental finetune, so try words or short phrases only.
	- Dataset: `google/smol` → `gatitos__en_dv`
	- Training framework: Hugging Face Transformers
	- Loss target: ~0.01

	## Training Details

	\| Parameter \| Value \|
	\|------------------------\|--------------\|
	\| Epochs \| 90 \|
	\| Batch size \| 4 \|
	\| Learning rate \| 5e-5 (constant) \|
	\| Final train loss \| 0.3797 \|
	\| Gradient norm (last) \| 15.72 \|
	\| Total steps \| 89,460 \|
	\| Samples/sec \| ~14.24 \|
	\| FLOPs \| 2.36e+16 \|

	- Training time: ~6.98 hours (25,117 seconds)
	- Optimizer: AdamW
	- Scheduler: Constant (no decay)
	- Logging: Weights & Biases


	## Example Usage (Gradio)

	```python
	from transformers import MT5ForConditionalGeneration, T5Tokenizer

	model = MT5ForConditionalGeneration.from_pretrained("alakxender/mt5-dhivehi-word-parallel")
	tokenizer = T5Tokenizer.from_pretrained("alakxender/mt5-dhivehi-word-parallel")

	text = "translate English to Dhivehi: Hello, how are you?"
	inputs = tokenizer(text, return_tensors="pt")
	output = model.generate(**inputs, max_length=64)
	print(tokenizer.decode(output[0], skip_special_tokens=True))
	```

	## Intended Use

	This model is meant for:

	- Research in low-resource translation
	- Experimentation with Dhivehi-language modeling
	- Exprementation on the tokenizer