Kunalatmosoft
/

imdb_model

Model card Files Files and versions Community

Kunalatmosoft commited on Feb 13

Commit

3ca9351

·

verified ·

1 Parent(s): 700fbd8

Update README.md

Files changed (1) hide show

README.md +106 -3

README.md CHANGED Viewed

@@ -1,3 +1,106 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+# IMDb Sentiment Analysis Model
+## Model Overview
+This model is a fine-tuned **DistilBERT** (`distilbert-base-uncased`) for **sentiment analysis** on the IMDb dataset. It classifies movie reviews as **positive (1) or negative (0)**.
+## Dataset
+- **Dataset Used**: IMDb Movie Reviews
+- **Source**: Hugging Face's `datasets` library (`imdb`)
+- **Training Samples**: 50 (for fast training)
+- **Test Samples**: 20
+## Training Details
+- **Model Architecture**: DistilBERT for Sequence Classification
+- **Pretrained Model**: `distilbert-base-uncased`
+- **Training Time**: ~1 minute
+- **Number of Epochs**: 1
+- **Batch Size**: 1 (for speed)
+- **Evaluation Strategy**: Per epoch
+## Training Script
+```python
+from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer, AutoTokenizer
+from datasets import load_dataset
+# Load IMDb dataset
+dataset = load_dataset("imdb")
+# Load tokenizer and model
+model_name = "distilbert-base-uncased"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
+# Tokenize dataset
+def tokenize_function(examples):
+    return tokenizer(examples["text"], padding="max_length", truncation=True)
+tokenized_datasets = dataset.map(tokenize_function, batched=True)
+# Reduce dataset size for fast training
+train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(50))
+test_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(20))
+# Training arguments
+training_args = TrainingArguments(
+    output_dir="./results",
+    evaluation_strategy="epoch",
+    per_device_train_batch_size=1,
+    per_device_eval_batch_size=1,
+    num_train_epochs=1,
+    save_strategy="epoch",
+    report_to="none"
+)
+# Trainer setup
+trainer = Trainer(
+    model=model,
+    args=training_args,
+    train_dataset=train_dataset,
+    eval_dataset=test_dataset
+)
+# Train the model
+trainer.train()
+# Save trained model
+model.save_pretrained("my_model")
+tokenizer.save_pretrained("my_model")
+```
+## How to Use the Model
+You can load the trained model and use it for sentiment analysis as follows:
+```python
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+import torch
+# Load the trained model
+tokenizer = AutoTokenizer.from_pretrained("my_model")
+model = AutoModelForSequenceClassification.from_pretrained("my_model")
+# Function to predict sentiment
+def predict_sentiment(text):
+    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
+    outputs = model(**inputs)
+    prediction = torch.argmax(outputs.logits, dim=1).item()
+    return "Positive" if prediction == 1 else "Negative"
+# Example usage
+print(predict_sentiment("This movie was amazing!"))  # Expected: Positive
+print(predict_sentiment("I didn't like this movie."))  # Expected: Negative
+```
+## Deployment
+The trained model can be deployed on Hugging Face for inference:
+```bash
+huggingface-cli login
+transformers-cli upload "my_model" --organization your-hf-username
+```
+## License
+MIT License