|
--- |
|
library_name: transformers |
|
tags: |
|
- colloquial |
|
- translation |
|
- SAWiT |
|
- Hackathon |
|
- GUVI |
|
- WOMENtech |
|
license: mit |
|
datasets: |
|
- bajpaideeksha/English_hinglish_colloquial-dataset |
|
language: |
|
- en |
|
- hi |
|
base_model: |
|
- openai-community/gpt2 |
|
--- |
|
|
|
|
|
--- |
|
language: |
|
- en |
|
- hi |
|
tags: |
|
- translation |
|
- hinglish |
|
license: mit |
|
--- |
|
|
|
# Hinglish Translation Model |
|
|
|
This model translates **English to Hinglish** (a mix of Hindi and English). It is fine-tuned using **GPT-2** and **LoRA (Low-Rank Adaptation)** for efficient and lightweight training. |
|
|
|
## Model Description |
|
|
|
The model is designed to convert English sentences into Hinglish, a colloquial blend of Hindi and English commonly used in informal communication. It is particularly useful for applications like chatbots, social media tools, and language learning platforms. |
|
|
|
- **Model type**: GPT-2 fine-tuned with LoRA |
|
- **Languages**: English (input), Hinglish (output) |
|
- **Training data**: 3080 rows of English-Hinglish pairs |
|
- **Fine-tuning method**: LoRA (Low-Rank Adaptation) |
|
- **License**: MIT |
|
|
|
## Intended Use |
|
|
|
This model is intended for: |
|
- Translating English sentences to Hinglish. |
|
- Generating informal, conversational text. |
|
- Applications in chatbots, social media, and language learning tools. |
|
|
|
## How to Use |
|
|
|
You can use this model with the Hugging Face `transformers` library. Below is an example of how to load and use the model: |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
# Load the tokenizer and model |
|
tokenizer = AutoTokenizer.from_pretrained("bajpaideeksha/hinglish-translation") |
|
model = AutoModelForCausalLM.from_pretrained("bajpaideeksha/hinglish-translation") |
|
|
|
# Input text |
|
input_text = "Did you prepone the meeting?" |
|
|
|
# Tokenize input |
|
inputs = tokenizer(input_text, return_tensors="pt") |
|
|
|
# Generate output |
|
outputs = model.generate(**inputs, max_length=50) |
|
|
|
# Decode and print the output |
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
|
|
|
**Training Details |
|
|
|
Dataset |
|
The model was fine-tuned on a custom dataset of 3080 English-Hinglish sentence pairs. The dataset includes colloquial phrases, humor, and regional variations. |
|
|
|
Training Parameters |
|
Base model: GPT-2 |
|
|
|
Fine-tuning method: LoRA (Low-Rank Adaptation) |
|
|
|
Epochs: 5 |
|
|
|
Batch size: 2 |
|
|
|
Learning rate: 2e-5 |
|
|
|
FP16 mixed precision: Enabled |
|
|
|
Hardware |
|
GPU: NVIDIA T4 (Google Colab) |
|
|
|
Training time: ~1 hour |
|
|
|
Limitations |
|
Small Dataset: The model is trained on a relatively small dataset (3080 rows), so it may not generalize well to all types of sentences. |
|
|
|
Complex Sentences: The model may struggle with complex or highly technical sentences. |
|
|
|
Regional Variations: While the model handles some regional variations, it may not capture all dialects of Hinglish. |
|
|
|
Humor and Context: The model may not always understand humor or context perfectly. |
|
|
|
Ethical Considerations |
|
Bias: The model may inherit biases present in the training data. Use with caution in sensitive applications. |
|
|
|
Misuse: The model should not be used for generating harmful, offensive, or misleading content. |
|
|
|
License |
|
This model is licensed under the MIT License. See the LICENSE file for more details. |
|
|
|
|
|
Thank you for using the Hinglish Translation Model! ๐ |
|
|
|
|
|
|