Transformer Model for Language Translation
Overview
This project implements a Transformer model for language translation between English and Italian. Built from scratch, it aims to provide a deeper understanding of the Transformer architecture, which has become a cornerstone in natural language processing tasks. The project explores key elements of the architecture, such as the attention mechanism, and demonstrates hands-on experience with data preprocessing, model training, and evaluation.
Learning Objectives
- Understand and implement the Transformer model architecture.
- Explore the attention mechanism and its application in language translation.
- Gain practical experience with data preprocessing, model training, and evaluation in NLP.
Model Card on Hugging Face
You can find and use the pre-trained model on Hugging Face here: Model on Hugging Face
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-huggingface-model-url")
model = AutoModelForSeq2SeqLM.from_pretrained("your-huggingface-model-url")
# Translation Example
text = "Hello, how are you?"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs)
translated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translated_text)
Project Structure
- Attention Visualization (
attention_visual.ipynb
): A notebook for visualizing attention maps to understand how the model focuses on different sentence parts during translation. - Configuration Settings (
config.py
): Includes hyperparameters and other modifiable settings. - Dataset Processing (
dataset.py
): Handles loading and preprocessing of English and Italian datasets. - Model Architecture (
model.py
): Defines the Transformer model architecture. - Project Documentation (
README.md
): This file, which provides a complete overview of the project. - Experiment Logs (
runs/
): Logs and outputs from model training sessions. - Tokenizers (
tokenizer_en.json
,tokenizer_it.json
): Tokenizers for English and Italian text preprocessing. - Training Script (
train.py
): The script that encapsulates the training process. - Saved Model Weights (
weights/
): Stores the trained model weights for future use.
Installation
To set up and run the project locally, follow these steps:
Clone the Repository:
git clone https://github.com/amc-madalin/transformer-for-language-translation.git
Create a Python Environment:
Create a Conda environment:conda create --name transformer python=3.x
Replace
3.x
with your preferred Python version.Activate the Environment:
conda activate transformer
Install Dependencies: Install required packages from
requirements.txt
:pip install -r requirements.txt
Prepare Data: The dataset will be automatically downloaded. Modify the source (
lang_src
) and target (lang_tgt
) languages inconfig.py
, if necessary. The default is set to English (en
) and Italian (it
):"lang_src": "en", "lang_tgt": "it",
Train the Model: Start the training process with:
python train.py
Use the Model: The trained model weights will be saved in the
weights/
directory. Use these weights for inference, evaluation, or further applications.
Using the Model with Hugging Face
Once trained, the model can be uploaded to Hugging Face for easy access and use.
Uploading the Model to Hugging Face
Use the following steps to upload your trained model to Hugging Face:
huggingface-cli login
transformers-cli upload ./weights/ --organization your-organization
Loading the Model from Hugging Face for Inference
You can easily load the model for translation tasks directly from Hugging Face:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("your-huggingface-model-url")
model = AutoModelForSeq2SeqLM.from_pretrained("your-huggingface-model-url")
# Translate text
text = "How are you?"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translation)
Learning Resources
- YouTube - Coding a Transformer from Scratch on PyTorch
A detailed walkthrough of coding a Transformer model from scratch using PyTorch, including training and inference.
Acknowledgements
Special thanks to Umar Jamil for his guidance and contributions that supported the completion of this project.