prithivMLmods
/

Spam-Bert-Uncased

@@ -9,12 +9,12 @@ base_model:
 pipeline_tag: text-classification
 library_name: transformers
 ---
-### **SPAM DETECTION UNCASED [ SPAM / HAM ]**
-This implementation leverages **BERT (Bidirectional Encoder Representations from Transformers)** for binary classification (Spam / Ham) using sequence classification. The model uses the **`prithivMLmods/Spam-Text-Detect-Analysis` dataset** and integrates **Weights & Biases (wandb)** for comprehensive experiment tracking.
 ---
-### Summary of Uploaded Files:
 | **File Name**                     | **Size**  | **Description**                                     | **Upload Status** |
 |------------------------------------|-----------|-----------------------------------------------------|-------------------|
@@ -49,9 +49,7 @@ Results were obtained using BERT and the provided training dataset:
 - **Precision:** **0.9931**
 - **Recall:** **0.9597**
 - **F1 Score:** **0.9761**
 ---
 ## **📈 Model Training Details**
 ### **Model Architecture:**
@@ -62,74 +60,27 @@ The model uses `bert-base-uncased` as the pre-trained backbone and is fine-tuned
 - **Batch Size:** 16
 - **Epochs:** 3
 - **Loss:** Cross-Entropy
 ---
-## Gradio Build
-```python
-import gradio as gr
-import torch
-from transformers import BertTokenizer, BertForSequenceClassification
-# Load the pre-trained BERT model and tokenizer
-MODEL_PATH = "prithivMLmods/Spam-Bert-Uncased"
-tokenizer = BertTokenizer.from_pretrained(MODEL_PATH)
-model = BertForSequenceClassification.from_pretrained(MODEL_PATH)
-# Function to predict if a given text is Spam or Ham
-def predict_spam(text):
-    # Tokenize the input text
-    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
-    # Perform inference
-    with torch.no_grad():
-        outputs = model(**inputs)
-        logits = outputs.logits
-        prediction = torch.argmax(logits, axis=-1).item()
-    # Map prediction to label
-    if prediction == 1:
-        return "Spam"
-    else:
-        return "Ham"
-# Gradio UI - Input and Output components
-inputs = gr.Textbox(label="Enter Text", placeholder="Type a message to check if it's Spam or Ham...")
-outputs = gr.Label(label="Prediction")
-# List of example inputs
-examples = [
-    ["Win $1000 gift cards now by clicking here!"],
-    ["You have been selected for a lottery."],
-    ["Hello, how was your day?"],
-    ["Earn money without any effort. Click here."],
-    ["Meeting tomorrow at 10 AM. Don't be late."],
-    ["Claim your free prize now!"],
-    ["Are we still on for dinner tonight?"],
-    ["Exclusive offer just for you, act now!"],
-    ["Let's catch up over coffee soon."],
-    ["Congratulations, you've won a new car!"]
-]
-# Create the Gradio interface
-gr_interface = gr.Interface(
-    fn=predict_spam,
-    inputs=inputs,
-    outputs=outputs,
-    examples=examples,
-    title="Spam Detection with BERT",
-    description="Type a message in the text box to check if it's Spam or Ham using a pre-trained BERT model."
-)
-# Launch the application
-gr_interface.launch()
 ```
-### Train Details
 ```python
 # Import necessary libraries
 from datasets import load_dataset, ClassLabel
 from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
@@ -235,33 +186,14 @@ def predict(text):
 example_text = "Congratulations! You've won a $1000 Walmart gift card. Click here to claim now."
 print("Prediction:", predict(example_text))
 ```
-## **🚀 How to Train the Model**
-1. **Clone Repository:**
-   ```bash
-   git clone <repository-url>
-   cd <project-directory>
-   ```
-2. **Install Dependencies:**
-   Install all necessary dependencies.
-   ```bash
-   pip install -r requirements.txt
-   ```
-   or manually:
-   ```bash
-   pip install transformers datasets wandb scikit-learn
-   ```
-3. **Train the Model:**
-   Assuming you have a script like `train.py`, run:
-   ```python
-   from train import main
-   ```
 ---
 ## **✨ Weights & Biases Integration**
 ### Why Use wandb?
@@ -275,10 +207,8 @@ Include this snippet in your training script:
 import wandb
 wandb.init(project="spam-detection")
 ```
 ---
-## 📁 **Directory Structure**
 The directory is organized to ensure scalability and clear separation of components:
@@ -292,14 +222,57 @@ project-directory/
 ├── requirements.txt    # List of dependencies
 └── train.py            # Main script for training the model
 ```
 ---
-## 🔗 Dataset Information
-The training dataset comes from **Spam-Text-Detect-Analysis** available on Hugging Face:
-- **Dataset Link:** [Spam Text Detection Dataset - Hugging Face](https://huggingface.co/datasets)
-Dataset size:
-- **5.57k entries**
 ---

 pipeline_tag: text-classification
 library_name: transformers
 ---
+# **Spam Detection with BERT**
+This repository contains an implementation of a **Spam Detection** model using **BERT (Bidirectional Encoder Representations from Transformers)** for binary classification (Spam / Ham). The model is trained on the **`prithivMLmods/Spam-Text-Detect-Analysis` dataset** and leverages **Weights & Biases (wandb)** for comprehensive experiment tracking.
 ---
+## **🗂️ Summary of Uploaded Files**
 | **File Name**                     | **Size**  | **Description**                                     | **Upload Status** |
 |------------------------------------|-----------|-----------------------------------------------------|-------------------|
 - **Precision:** **0.9931**
 - **Recall:** **0.9597**
 - **F1 Score:** **0.9761**
 ---
 ## **📈 Model Training Details**
 ### **Model Architecture:**
 - **Batch Size:** 16
 - **Epochs:** 3
 - **Loss:** Cross-Entropy
 ---
+## **🚀 How to Use the Model**
+### **1. Clone the Repository**
+```bash
+git clone <repository-url>
+cd <project-directory>
 ```
+### **2. Install Dependencies**
+Install all necessary dependencies.
+```bash
+pip install -r requirements.txt
+```
+or manually:
+```bash
+pip install transformers datasets wandb scikit-learn
+```
+### **3. Train the Model**
+Assuming you have a script like `train.py`, run:
 ```python
 # Import necessary libraries
 from datasets import load_dataset, ClassLabel
 from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
 example_text = "Congratulations! You've won a $1000 Walmart gift card. Click here to claim now."
 print("Prediction:", predict(example_text))
 ```
 ---
+## **🔗 Dataset Information**
+The training dataset comes from **Spam-Text-Detect-Analysis** available on Hugging Face:
+- **Dataset Link:** [Spam Text Detection Dataset - Hugging Face](https://huggingface.co/datasets)
+Dataset size:
+- **5.57k entries**
+---
 ## **✨ Weights & Biases Integration**
 ### Why Use wandb?
 import wandb
 wandb.init(project="spam-detection")
 ```
 ---
+## **📁 Directory Structure**
 The directory is organized to ensure scalability and clear separation of components:
 ├── requirements.txt    # List of dependencies
 └── train.py            # Main script for training the model
 ```
 ---
+## **🌐 Gradio Interface**
+A Gradio interface is provided to test the model interactively. The interface allows users to input text and get predictions on whether the text is **Spam** or **Ham**.
+### **Example Usage**
+```python
+import gradio as gr
+import torch
+from transformers import BertTokenizer, BertForSequenceClassification
+# Load the pre-trained BERT model and tokenizer
+MODEL_PATH = "prithivMLmods/Spam-Bert-Uncased"
+tokenizer = BertTokenizer.from_pretrained(MODEL_PATH)
+model = BertForSequenceClassification.from_pretrained(MODEL_PATH)
+# Function to predict if a given text is Spam or Ham
+def predict_spam(text):
+    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
+    with torch.no_grad():
+        outputs = model(**inputs)
+        logits = outputs.logits
+        prediction = torch.argmax(logits, axis=-1).item()
+    return "Spam" if prediction == 1 else "Ham"
+# Gradio UI
+inputs = gr.Textbox(label="Enter Text", placeholder="Type a message to check if it's Spam or Ham...")
+outputs = gr.Label(label="Prediction")
+examples = [
+    ["Win $1000 gift cards now by clicking here!"],
+    ["You have been selected for a lottery."],
+    ["Hello, how was your day?"],
+    ["Earn money without any effort. Click here."],
+    ["Meeting tomorrow at 10 AM. Don't be late."],
+    ["Claim your free prize now!"],
+    ["Are we still on for dinner tonight?"],
+    ["Exclusive offer just for you, act now!"],
+    ["Let's catch up over coffee soon."],
+    ["Congratulations, you've won a new car!"]
+]
+gr_interface = gr.Interface(
+    fn=predict_spam,
+    inputs=inputs,
+    outputs=outputs,
+    examples=examples,
+    title="Spam Detection with BERT",
+    description="Type a message in the text box to check if it's Spam or Ham using a pre-trained BERT model."
+)
+gr_interface.launch()
+```
 ---