---
license: apache-2.0
---
# IMDb Sentiment Analysis Model

## Model Overview
This model is a fine-tuned **DistilBERT** (`distilbert-base-uncased`) for **sentiment analysis** on the IMDb dataset. It classifies movie reviews as **positive (1) or negative (0)**.

## Dataset
- **Dataset Used**: IMDb Movie Reviews
- **Source**: Hugging Face's `datasets` library (`imdb`)
- **Training Samples**: 50 (for fast training)
- **Test Samples**: 20

## Training Details
- **Model Architecture**: DistilBERT for Sequence Classification
- **Pretrained Model**: `distilbert-base-uncased`
- **Training Time**: ~1 minute
- **Number of Epochs**: 1
- **Batch Size**: 1 (for speed)
- **Evaluation Strategy**: Per epoch

## Training Script
```python
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer, AutoTokenizer
from datasets import load_dataset

# Load IMDb dataset
dataset = load_dataset("imdb")

# Load tokenizer and model
model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)

# Tokenize dataset
def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_datasets = dataset.map(tokenize_function, batched=True)

# Reduce dataset size for fast training
train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(50))
test_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(20))

# Training arguments
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    num_train_epochs=1,
    save_strategy="epoch",
    report_to="none"
)

# Trainer setup
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset
)

# Train the model
trainer.train()

# Save trained model
model.save_pretrained("my_model")
tokenizer.save_pretrained("my_model")
```

## How to Use the Model
You can load the trained model and use it for sentiment analysis as follows:

```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load the trained model
tokenizer = AutoTokenizer.from_pretrained("my_model")
model = AutoModelForSequenceClassification.from_pretrained("my_model")

# Function to predict sentiment
def predict_sentiment(text):
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
    outputs = model(**inputs)
    prediction = torch.argmax(outputs.logits, dim=1).item()
    return "Positive" if prediction == 1 else "Negative"

# Example usage
print(predict_sentiment("This movie was amazing!"))  # Expected: Positive
print(predict_sentiment("I didn't like this movie."))  # Expected: Negative
```

## Deployment
The trained model can be deployed on Hugging Face for inference:

```bash
huggingface-cli login
transformers-cli upload "my_model" --organization your-hf-username
```

## License
MIT License