IMDb Sentiment Analysis Model

Model Overview

This model is a fine-tuned DistilBERT (distilbert-base-uncased) for sentiment analysis on the IMDb dataset. It classifies movie reviews as positive (1) or negative (0).

Dataset

Dataset Used: IMDb Movie Reviews
Source: Hugging Face's datasets library (imdb)
Training Samples: 50 (for fast training)
Test Samples: 20

Training Details

Model Architecture: DistilBERT for Sequence Classification
Pretrained Model: distilbert-base-uncased
Training Time: ~1 minute
Number of Epochs: 1
Batch Size: 1 (for speed)
Evaluation Strategy: Per epoch

Training Script

from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer, AutoTokenizer
from datasets import load_dataset

# Load IMDb dataset
dataset = load_dataset("imdb")

# Load tokenizer and model
model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)

# Tokenize dataset
def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_datasets = dataset.map(tokenize_function, batched=True)

# Reduce dataset size for fast training
train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(50))
test_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(20))

# Training arguments
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    num_train_epochs=1,
    save_strategy="epoch",
    report_to="none"
)

# Trainer setup
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset
)

# Train the model
trainer.train()

# Save trained model
model.save_pretrained("my_model")
tokenizer.save_pretrained("my_model")

How to Use the Model

You can load the trained model and use it for sentiment analysis as follows:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load the trained model
tokenizer = AutoTokenizer.from_pretrained("my_model")
model = AutoModelForSequenceClassification.from_pretrained("my_model")

# Function to predict sentiment
def predict_sentiment(text):
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
    outputs = model(**inputs)
    prediction = torch.argmax(outputs.logits, dim=1).item()
    return "Positive" if prediction == 1 else "Negative"

# Example usage
print(predict_sentiment("This movie was amazing!"))  # Expected: Positive
print(predict_sentiment("I didn't like this movie."))  # Expected: Negative

Deployment

The trained model can be deployed on Hugging Face for inference:

huggingface-cli login
transformers-cli upload "my_model" --organization your-hf-username

License

MIT License

Kunalatmosoft
/

imdb_model