janisrebekahv/finetuned-colloquial-tamil

πŸ“Œ Model Overview

This is a fine-tuned version of suriya7/English-to-Tamil, trained to produce colloquial Tamil translations instead of formal Tamil.

βœ… Translates English β†’ Colloquial Tamil
βœ… Incorporates slang, informal speech, and real-world phrasing
βœ… Useful for chatbots, conversational AI, and social media applications


πŸ“œ Dataset

πŸ”Ή Custom Dataset Used for Fine-Tuning:
πŸ“‚ janisrebekahv/colloquial_tamil
This dataset was specifically curated to train this model, improving its ability to translate English to Colloquial Tamil accurately.
This model was fine-tuned on a custom dataset, which includes:

1️⃣ jarvisvasu/english-to-colloquial-tamil – A publicly available dataset for informal Tamil translations.
2️⃣ YouTube Comments Dataset (Custom-Created) – Extracted using the YouTube API and manually converted to colloquial Tamil for authenticity.
3️⃣ ChatGPT-Generated Data – Additional colloquial Tamil phrases aligned with natural speech patterns.

πŸ“ Total dataset size: 16,269 sentence pairs


πŸ”₯ Example Usage

Load and test the model using Hugging Face Transformers:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

# Load model and tokenizer
model_name = "janisrebekahv/finetuned-colloquial-tamil"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# Function to translate text
def translate(text):
    inputs = tokenizer(text, return_tensors="pt")
    outputs = model.generate(**inputs, max_length=128)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example translations
test_sentences = [
    "This is so beautiful",
    "Bro, are you coming or not?",
    "My mom is gonna kill me if I don't reach home now!"
]

for sentence in test_sentences:
    print(f"English: {sentence}")
    print(f"Colloquial Tamil: {translate(sentence)}\n")
Downloads last month
108
Safetensors
Model size
484M params
Tensor type
F32
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Datasets used to train janisrebekahv/finetuned-colloquial-tamil

Evaluation results