---
library_name: transformers
tags:
- colloquial
- translation
- SAWiT
- Hackathon
- GUVI
- WOMENtech
license: mit
datasets:
- bajpaideeksha/English_hinglish_colloquial-dataset
language:
- en
- hi
base_model:
- openai-community/gpt2
---


---
language:
  - en
  - hi
tags:
  - translation
  - hinglish
license: mit
---

# Hinglish Translation Model

This model translates **English to Hinglish** (a mix of Hindi and English). It is fine-tuned using **GPT-2** and **LoRA (Low-Rank Adaptation)** for efficient and lightweight training.

## Model Description

The model is designed to convert English sentences into Hinglish, a colloquial blend of Hindi and English commonly used in informal communication. It is particularly useful for applications like chatbots, social media tools, and language learning platforms.

- **Model type**: GPT-2 fine-tuned with LoRA
- **Languages**: English (input), Hinglish (output)
- **Training data**: 3080 rows of English-Hinglish pairs
- **Fine-tuning method**: LoRA (Low-Rank Adaptation)
- **License**: MIT

## Intended Use

This model is intended for:
- Translating English sentences to Hinglish.
- Generating informal, conversational text.
- Applications in chatbots, social media, and language learning tools.

## How to Use

You can use this model with the Hugging Face `transformers` library. Below is an example of how to load and use the model:

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("bajpaideeksha/hinglish-translation")
model = AutoModelForCausalLM.from_pretrained("bajpaideeksha/hinglish-translation")

# Input text
input_text = "Did you prepone the meeting?"

# Tokenize input
inputs = tokenizer(input_text, return_tensors="pt")

# Generate output
outputs = model.generate(**inputs, max_length=50)

# Decode and print the output
print(tokenizer.decode(outputs[0], skip_special_tokens=True))


**Training Details

Dataset
The model was fine-tuned on a custom dataset of 3080 English-Hinglish sentence pairs. The dataset includes colloquial phrases, humor, and regional variations.

Training Parameters
Base model: GPT-2

Fine-tuning method: LoRA (Low-Rank Adaptation)

Epochs: 5

Batch size: 2

Learning rate: 2e-5

FP16 mixed precision: Enabled

Hardware
GPU: NVIDIA T4 (Google Colab)

Training time: ~1 hour

Limitations
Small Dataset: The model is trained on a relatively small dataset (3080 rows), so it may not generalize well to all types of sentences.

Complex Sentences: The model may struggle with complex or highly technical sentences.

Regional Variations: While the model handles some regional variations, it may not capture all dialects of Hinglish.

Humor and Context: The model may not always understand humor or context perfectly.

Ethical Considerations
Bias: The model may inherit biases present in the training data. Use with caution in sensitive applications.

Misuse: The model should not be used for generating harmful, offensive, or misleading content.

License
This model is licensed under the MIT License. See the LICENSE file for more details.


Thank you for using the Hinglish Translation Model! 😊