IMDb Sentiment Analysis Model
Model Overview
This model is a fine-tuned DistilBERT (distilbert-base-uncased
) for sentiment analysis on the IMDb dataset. It classifies movie reviews as positive (1) or negative (0).
Dataset
- Dataset Used: IMDb Movie Reviews
- Source: Hugging Face's
datasets
library (imdb
) - Training Samples: 50 (for fast training)
- Test Samples: 20
Training Details
- Model Architecture: DistilBERT for Sequence Classification
- Pretrained Model:
distilbert-base-uncased
- Training Time: ~1 minute
- Number of Epochs: 1
- Batch Size: 1 (for speed)
- Evaluation Strategy: Per epoch
Training Script
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer, AutoTokenizer
from datasets import load_dataset
# Load IMDb dataset
dataset = load_dataset("imdb")
# Load tokenizer and model
model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
# Tokenize dataset
def tokenize_function(examples):
return tokenizer(examples["text"], padding="max_length", truncation=True)
tokenized_datasets = dataset.map(tokenize_function, batched=True)
# Reduce dataset size for fast training
train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(50))
test_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(20))
# Training arguments
training_args = TrainingArguments(
output_dir="./results",
evaluation_strategy="epoch",
per_device_train_batch_size=1,
per_device_eval_batch_size=1,
num_train_epochs=1,
save_strategy="epoch",
report_to="none"
)
# Trainer setup
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=test_dataset
)
# Train the model
trainer.train()
# Save trained model
model.save_pretrained("my_model")
tokenizer.save_pretrained("my_model")
How to Use the Model
You can load the trained model and use it for sentiment analysis as follows:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
# Load the trained model
tokenizer = AutoTokenizer.from_pretrained("my_model")
model = AutoModelForSequenceClassification.from_pretrained("my_model")
# Function to predict sentiment
def predict_sentiment(text):
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=1).item()
return "Positive" if prediction == 1 else "Negative"
# Example usage
print(predict_sentiment("This movie was amazing!")) # Expected: Positive
print(predict_sentiment("I didn't like this movie.")) # Expected: Negative
Deployment
The trained model can be deployed on Hugging Face for inference:
huggingface-cli login
transformers-cli upload "my_model" --organization your-hf-username
License
MIT License
- Downloads last month
- 14
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.