bajpaideeksha
/

hinglish-translation

Model card Files Files and versions Community

hinglish-translation / README.md

bajpaideeksha's picture

Update README.md

4079b3d verified 5 months ago

|

history blame contribute delete

3.18 kB

	---
	library_name: transformers
	tags:
	- colloquial
	- translation
	- SAWiT
	- Hackathon
	- GUVI
	- WOMENtech
	license: mit
	datasets:
	- bajpaideeksha/English_hinglish_colloquial-dataset
	language:
	- en
	- hi
	base_model:
	- openai-community/gpt2
	---


	---
	language:
	- en
	- hi
	tags:
	- translation
	- hinglish
	license: mit
	---

	# Hinglish Translation Model

	This model translates English to Hinglish (a mix of Hindi and English). It is fine-tuned using GPT-2 and LoRA (Low-Rank Adaptation) for efficient and lightweight training.

	## Model Description

	The model is designed to convert English sentences into Hinglish, a colloquial blend of Hindi and English commonly used in informal communication. It is particularly useful for applications like chatbots, social media tools, and language learning platforms.

	- Model type: GPT-2 fine-tuned with LoRA
	- Languages: English (input), Hinglish (output)
	- Training data: 3080 rows of English-Hinglish pairs
	- Fine-tuning method: LoRA (Low-Rank Adaptation)
	- License: MIT

	## Intended Use

	This model is intended for:
	- Translating English sentences to Hinglish.
	- Generating informal, conversational text.
	- Applications in chatbots, social media, and language learning tools.

	## How to Use

	You can use this model with the Hugging Face `transformers` library. Below is an example of how to load and use the model:

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	# Load the tokenizer and model
	tokenizer = AutoTokenizer.from_pretrained("bajpaideeksha/hinglish-translation")
	model = AutoModelForCausalLM.from_pretrained("bajpaideeksha/hinglish-translation")

	# Input text
	input_text = "Did you prepone the meeting?"

	# Tokenize input
	inputs = tokenizer(input_text, return_tensors="pt")

	# Generate output
	outputs = model.generate(**inputs, max_length=50)

	# Decode and print the output
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))


	**Training Details

	Dataset
	The model was fine-tuned on a custom dataset of 3080 English-Hinglish sentence pairs. The dataset includes colloquial phrases, humor, and regional variations.

	Training Parameters
	Base model: GPT-2

	Fine-tuning method: LoRA (Low-Rank Adaptation)

	Epochs: 5

	Batch size: 2

	Learning rate: 2e-5

	FP16 mixed precision: Enabled

	Hardware
	GPU: NVIDIA T4 (Google Colab)

	Training time: ~1 hour

	Limitations
	Small Dataset: The model is trained on a relatively small dataset (3080 rows), so it may not generalize well to all types of sentences.

	Complex Sentences: The model may struggle with complex or highly technical sentences.

	Regional Variations: While the model handles some regional variations, it may not capture all dialects of Hinglish.

	Humor and Context: The model may not always understand humor or context perfectly.

	Ethical Considerations
	Bias: The model may inherit biases present in the training data. Use with caution in sensitive applications.

	Misuse: The model should not be used for generating harmful, offensive, or misleading content.

	License
	This model is licensed under the MIT License. See the LICENSE file for more details.


	Thank you for using the Hinglish Translation Model! 😊