IMDb Sentiment Analysis Model

Model Overview

This model is a fine-tuned DistilBERT (distilbert-base-uncased) for sentiment analysis on the IMDb dataset. It classifies movie reviews as positive (1) or negative (0).

Dataset

  • Dataset Used: IMDb Movie Reviews
  • Source: Hugging Face's datasets library (imdb)
  • Training Samples: 50 (for fast training)
  • Test Samples: 20

Training Details

  • Model Architecture: DistilBERT for Sequence Classification
  • Pretrained Model: distilbert-base-uncased
  • Training Time: ~1 minute
  • Number of Epochs: 1
  • Batch Size: 1 (for speed)
  • Evaluation Strategy: Per epoch

Training Script

from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer, AutoTokenizer
from datasets import load_dataset

# Load IMDb dataset
dataset = load_dataset("imdb")

# Load tokenizer and model
model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)

# Tokenize dataset
def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_datasets = dataset.map(tokenize_function, batched=True)

# Reduce dataset size for fast training
train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(50))
test_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(20))

# Training arguments
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    num_train_epochs=1,
    save_strategy="epoch",
    report_to="none"
)

# Trainer setup
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset
)

# Train the model
trainer.train()

# Save trained model
model.save_pretrained("my_model")
tokenizer.save_pretrained("my_model")

How to Use the Model

You can load the trained model and use it for sentiment analysis as follows:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load the trained model
tokenizer = AutoTokenizer.from_pretrained("my_model")
model = AutoModelForSequenceClassification.from_pretrained("my_model")

# Function to predict sentiment
def predict_sentiment(text):
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
    outputs = model(**inputs)
    prediction = torch.argmax(outputs.logits, dim=1).item()
    return "Positive" if prediction == 1 else "Negative"

# Example usage
print(predict_sentiment("This movie was amazing!"))  # Expected: Positive
print(predict_sentiment("I didn't like this movie."))  # Expected: Negative

Deployment

The trained model can be deployed on Hugging Face for inference:

huggingface-cli login
transformers-cli upload "my_model" --organization your-hf-username

License

MIT License

Downloads last month
14
Safetensors
Model size
67M params
Tensor type
F32
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Space using Kunalatmosoft/imdb_model 1