--- license: apache-2.0 --- # IMDb Sentiment Analysis Model ## Model Overview This model is a fine-tuned **DistilBERT** (`distilbert-base-uncased`) for **sentiment analysis** on the IMDb dataset. It classifies movie reviews as **positive (1) or negative (0)**. ## Dataset - **Dataset Used**: IMDb Movie Reviews - **Source**: Hugging Face's `datasets` library (`imdb`) - **Training Samples**: 50 (for fast training) - **Test Samples**: 20 ## Training Details - **Model Architecture**: DistilBERT for Sequence Classification - **Pretrained Model**: `distilbert-base-uncased` - **Training Time**: ~1 minute - **Number of Epochs**: 1 - **Batch Size**: 1 (for speed) - **Evaluation Strategy**: Per epoch ## Training Script ```python from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer, AutoTokenizer from datasets import load_dataset # Load IMDb dataset dataset = load_dataset("imdb") # Load tokenizer and model model_name = "distilbert-base-uncased" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2) # Tokenize dataset def tokenize_function(examples): return tokenizer(examples["text"], padding="max_length", truncation=True) tokenized_datasets = dataset.map(tokenize_function, batched=True) # Reduce dataset size for fast training train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(50)) test_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(20)) # Training arguments training_args = TrainingArguments( output_dir="./results", evaluation_strategy="epoch", per_device_train_batch_size=1, per_device_eval_batch_size=1, num_train_epochs=1, save_strategy="epoch", report_to="none" ) # Trainer setup trainer = Trainer( model=model, args=training_args, train_dataset=train_dataset, eval_dataset=test_dataset ) # Train the model trainer.train() # Save trained model model.save_pretrained("my_model") tokenizer.save_pretrained("my_model") ``` ## How to Use the Model You can load the trained model and use it for sentiment analysis as follows: ```python from transformers import AutoModelForSequenceClassification, AutoTokenizer import torch # Load the trained model tokenizer = AutoTokenizer.from_pretrained("my_model") model = AutoModelForSequenceClassification.from_pretrained("my_model") # Function to predict sentiment def predict_sentiment(text): inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True) outputs = model(**inputs) prediction = torch.argmax(outputs.logits, dim=1).item() return "Positive" if prediction == 1 else "Negative" # Example usage print(predict_sentiment("This movie was amazing!")) # Expected: Positive print(predict_sentiment("I didn't like this movie.")) # Expected: Negative ``` ## Deployment The trained model can be deployed on Hugging Face for inference: ```bash huggingface-cli login transformers-cli upload "my_model" --organization your-hf-username ``` ## License MIT License