Transformer Model for Language Translation

Overview

This project implements a Transformer model for language translation between English and Italian. Built from scratch, it aims to provide a deeper understanding of the Transformer architecture, which has become a cornerstone in natural language processing tasks. The project explores key elements of the architecture, such as the attention mechanism, and demonstrates hands-on experience with data preprocessing, model training, and evaluation.

Learning Objectives

Understand and implement the Transformer model architecture.
Explore the attention mechanism and its application in language translation.
Gain practical experience with data preprocessing, model training, and evaluation in NLP.

Model Card on Hugging Face

You can find and use the pre-trained model on Hugging Face here: Model on Hugging Face

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-huggingface-model-url")
model = AutoModelForSeq2SeqLM.from_pretrained("your-huggingface-model-url")

# Translation Example
text = "Hello, how are you?"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs)
translated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translated_text)

Project Structure

Attention Visualization (attention_visual.ipynb): A notebook for visualizing attention maps to understand how the model focuses on different sentence parts during translation.
Configuration Settings (config.py): Includes hyperparameters and other modifiable settings.
Dataset Processing (dataset.py): Handles loading and preprocessing of English and Italian datasets.
Model Architecture (model.py): Defines the Transformer model architecture.
Project Documentation (README.md): This file, which provides a complete overview of the project.
Experiment Logs (runs/): Logs and outputs from model training sessions.
Tokenizers (tokenizer_en.json, tokenizer_it.json): Tokenizers for English and Italian text preprocessing.
Training Script (train.py): The script that encapsulates the training process.
Saved Model Weights (weights/): Stores the trained model weights for future use.

Installation

To set up and run the project locally, follow these steps:

Clone the Repository:

git clone https://github.com/amc-madalin/transformer-for-language-translation.git

Create a Python Environment:
Create a Conda environment:
```
conda create --name transformer python=3.x
```
Replace 3.x with your preferred Python version.
Activate the Environment:
```
conda activate transformer
```
Install Dependencies: Install required packages from requirements.txt:
```
pip install -r requirements.txt
```
Prepare Data: The dataset will be automatically downloaded. Modify the source (lang_src) and target (lang_tgt) languages in config.py, if necessary. The default is set to English (en) and Italian (it):
```
"lang_src": "en",
"lang_tgt": "it",
```
Train the Model: Start the training process with:
```
python train.py
```
Use the Model: The trained model weights will be saved in the weights/ directory. Use these weights for inference, evaluation, or further applications.

Using the Model with Hugging Face

Once trained, the model can be uploaded to Hugging Face for easy access and use.

Uploading the Model to Hugging Face

Use the following steps to upload your trained model to Hugging Face:

huggingface-cli login
transformers-cli upload ./weights/ --organization your-organization

Loading the Model from Hugging Face for Inference

You can easily load the model for translation tasks directly from Hugging Face:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("your-huggingface-model-url")
model = AutoModelForSeq2SeqLM.from_pretrained("your-huggingface-model-url")

# Translate text
text = "How are you?"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translation)

Learning Resources

YouTube - Coding a Transformer from Scratch on PyTorch
A detailed walkthrough of coding a Transformer model from scratch using PyTorch, including training and inference.

Acknowledgements

Special thanks to Umar Jamil for his guidance and contributions that supported the completion of this project.