--- library_name: transformers tags: - colloquial - translation - SAWiT - Hackathon - GUVI - WOMENtech license: mit datasets: - bajpaideeksha/English_hinglish_colloquial-dataset language: - en - hi base_model: - openai-community/gpt2 --- --- language: - en - hi tags: - translation - hinglish license: mit --- # Hinglish Translation Model This model translates **English to Hinglish** (a mix of Hindi and English). It is fine-tuned using **GPT-2** and **LoRA (Low-Rank Adaptation)** for efficient and lightweight training. ## Model Description The model is designed to convert English sentences into Hinglish, a colloquial blend of Hindi and English commonly used in informal communication. It is particularly useful for applications like chatbots, social media tools, and language learning platforms. - **Model type**: GPT-2 fine-tuned with LoRA - **Languages**: English (input), Hinglish (output) - **Training data**: 3080 rows of English-Hinglish pairs - **Fine-tuning method**: LoRA (Low-Rank Adaptation) - **License**: MIT ## Intended Use This model is intended for: - Translating English sentences to Hinglish. - Generating informal, conversational text. - Applications in chatbots, social media, and language learning tools. ## How to Use You can use this model with the Hugging Face `transformers` library. Below is an example of how to load and use the model: ```python from transformers import AutoTokenizer, AutoModelForCausalLM # Load the tokenizer and model tokenizer = AutoTokenizer.from_pretrained("bajpaideeksha/hinglish-translation") model = AutoModelForCausalLM.from_pretrained("bajpaideeksha/hinglish-translation") # Input text input_text = "Did you prepone the meeting?" # Tokenize input inputs = tokenizer(input_text, return_tensors="pt") # Generate output outputs = model.generate(**inputs, max_length=50) # Decode and print the output print(tokenizer.decode(outputs[0], skip_special_tokens=True)) **Training Details Dataset The model was fine-tuned on a custom dataset of 3080 English-Hinglish sentence pairs. The dataset includes colloquial phrases, humor, and regional variations. Training Parameters Base model: GPT-2 Fine-tuning method: LoRA (Low-Rank Adaptation) Epochs: 5 Batch size: 2 Learning rate: 2e-5 FP16 mixed precision: Enabled Hardware GPU: NVIDIA T4 (Google Colab) Training time: ~1 hour Limitations Small Dataset: The model is trained on a relatively small dataset (3080 rows), so it may not generalize well to all types of sentences. Complex Sentences: The model may struggle with complex or highly technical sentences. Regional Variations: While the model handles some regional variations, it may not capture all dialects of Hinglish. Humor and Context: The model may not always understand humor or context perfectly. Ethical Considerations Bias: The model may inherit biases present in the training data. Use with caution in sensitive applications. Misuse: The model should not be used for generating harmful, offensive, or misleading content. License This model is licensed under the MIT License. See the LICENSE file for more details. Thank you for using the Hinglish Translation Model! 😊