PhishMail / README.md

Update README.md

cbd347a verified 25 days ago

3.9 kB

	---
	library_name: transformers
	base_model:
	- google-bert/bert-base-uncased
	datasets:
	- zefang-liu/phishing-email-dataset
	language:
	- en
	metrics:
	- accuracy
	tags:
	- security
	- Phishing
	---


	# PhishMail - BERT Model for Phishing Detection

	This repository features a fine-tuned BERT model designed to detect phishing emails.
	The model is trained to classify emails as either phishing or legitimate by analyzing their body text.

	# Author - Jagan Raj
	https://www.linkedin.com/in/r-jagan-raj/

	## Model Details

	- Model Type: BERT (Bidirectional Encoder Representations from Transformers)
	- Task: Phishing detection (Binary classification: phishing vs. legitimate)
	- Fine-Tuning: The model was fine-tuned on a carefully curated dataset comprising phishing and legitimate emails, ensuring diversity in email content and structure.
	- Objective: To enhance email security by accurately identifying phishing attempts using contextual understanding of email body text.
	- Developed by: Jagan Raj
	- Model type: google-bert/bert-base-uncased
	- License: Free for all
	- Dataset: zefang-liu/phishing-email-dataset


	## Evaluation

	TrainOutput(global_step=6297, training_loss=0.07093968526965307, metrics={'train_runtime': 5545.442, 'train_samples_per_second': 9.08, 'train_steps_per_second': 1.136, 'total_flos': 1.32489571926528e+16, 'train_loss': 0.07093968526965307, 'epoch': 3.0})

	## How to Use

	Step 1: Installing Dependencies: Use the command below to install all the required libraries:

	```bash
	!pip install transformers torch

	```

	Step 2: Loading the Model:

	```bash
	from transformers import BertForSequenceClassification, BertTokenizer
	import torch

	# Specify the Hugging Face model repository name
	model_name = 'jagan-raj/PhishMail'

	# Load the fine-tuned BERT model for phishing detection
	model = BertForSequenceClassification.from_pretrained(model_name)

	# Load the corresponding tokenizer for the fine-tuned model
	tokenizer = BertTokenizer.from_pretrained(model_name)

	# Set the model to evaluation mode for inference
	model.eval()

	```

	Step 3: Using the Model for Predictions:

	```bash
	# Input the email text for classification
	email_text = "Your email content here"

	# Tokenize and preprocess the input text
	# Converts the email text into token IDs, applies truncation/padding, and creates a tensor
	inputs = tokenizer(
	email_text,
	return_tensors="pt", # Output tensors in PyTorch format
	truncation=True, # Truncate the text if it exceeds the max_length
	padding='max_length' # Pad the text to the maximum sequence length
	)

	# Make a prediction using the model
	with torch.no_grad(): # Disable gradient calculations for faster inference
	outputs = model(**inputs) # Get model outputs
	logits = outputs.logits # Extract raw prediction scores (logits)
	predictions = torch.argmax(logits, dim=-1) # Determine the predicted class (0 or 1)

	# Interpret the prediction result
	# Map the prediction to its corresponding label: 1 for "Phishing", 0 for "Legitimate"
	result = "This is a phishing email." if predictions.item() == 1 else "This is a legitimate email."

	# Print the prediction result
	print(f"Prediction: {result}")

	```

	# Model Summary:
	This fine-tuned BERT model is designed to detect phishing emails. Built on the powerful BERT (Bidirectional Encoder Representations from Transformers) architecture, it performs binary classification to label emails as either phishing or legitimate.

	The model has been fine-tuned using a dataset of phishing and legitimate emails, ensuring it understands patterns and linguistic cues commonly found in phishing content. By leveraging contextual understanding, it can identify subtle differences in text that distinguish malicious intent from normal communication. This makes it an effective tool for email security and anti-phishing defenses.