Binary Sentiment Classification Using Transformers
Introduction
This project demonstrates fine-tuning a pre-trained transformer model to perform binary sentiment classification using the IMDb dataset. The task involves classifying movie reviews as either negative (0) or positive (1). The implementation leverages the Hugging Face Transformers and Datasets libraries, along with PyTorch, to preprocess the data, fine-tune a pre-trained DistilBERT model, evaluate the model, and save the final model for future use.
Task Description
The assignment includes the following key steps:
Dataset Selection and Preprocessing
- Using the IMDb dataset from Hugging Face, which contains text reviews and their corresponding binary sentiment labels.
- Tokenizing the dataset with a pre-trained DistilBERT tokenizer.
- Splitting the data into training, validation, and test sets.
Model Selection and Fine-Tuning
- Loading a pre-trained DistilBERT model for sequence classification.
- Fine-tuning the model on the processed dataset using the Hugging Face
Trainer
API. - Configuring training parameters, including learning rate, batch size, number of epochs, and evaluation strategy.
Evaluation
- Evaluating model performance using metrics such as accuracy, F1-score, precision, and recall.
- Analyzing the model's performance on the test set.
Saving the Model
- Saving the fine-tuned model for later use.
Requirements
- Python 3.x
- Transformers
- Datasets
- Scikit-Learn
- PyTorch
- (Optional) Google Colab for easy experimentation
Installation
Install the necessary libraries using pip:
pip install -U transformers datasets scikit-learn torch
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support