janisrebekahv/finetuned-colloquial-tamil
π Model Overview
This is a fine-tuned version of suriya7/English-to-Tamil, trained to produce colloquial Tamil translations instead of formal Tamil.
β
Translates English β Colloquial Tamil
β
Incorporates slang, informal speech, and real-world phrasing
β
Useful for chatbots, conversational AI, and social media applications
π Dataset
πΉ Custom Dataset Used for Fine-Tuning:
π janisrebekahv/colloquial_tamil
This dataset was specifically curated to train this model, improving its ability to translate English to Colloquial Tamil accurately.
This model was fine-tuned on a custom dataset, which includes:
1οΈβ£ jarvisvasu/english-to-colloquial-tamil β A publicly available dataset for informal Tamil translations.
2οΈβ£ YouTube Comments Dataset (Custom-Created) β Extracted using the YouTube API and manually converted to colloquial Tamil for authenticity.
3οΈβ£ ChatGPT-Generated Data β Additional colloquial Tamil phrases aligned with natural speech patterns.
π Total dataset size: 16,269 sentence pairs
π₯ Example Usage
Load and test the model using Hugging Face Transformers:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
# Load model and tokenizer
model_name = "janisrebekahv/finetuned-colloquial-tamil"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
# Function to translate text
def translate(text):
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=128)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
# Example translations
test_sentences = [
"This is so beautiful",
"Bro, are you coming or not?",
"My mom is gonna kill me if I don't reach home now!"
]
for sentence in test_sentences:
print(f"English: {sentence}")
print(f"Colloquial Tamil: {translate(sentence)}\n")
- Downloads last month
- 108
Datasets used to train janisrebekahv/finetuned-colloquial-tamil
Evaluation results
- BLEU Score on janisrebekahv/colloquial_tamilself-reported38.500
- ROUGE Score on janisrebekahv/colloquial_tamilself-reported0.720