BERT Paraphrase Detection (GLUE MRPC)
This model is fine-tuned for the paraphrase detection task on the GLUE MRPC dataset. It determines whether two given sentences are paraphrases (i.e., if they have the same meaning or not). This is a binary classification task with the following labels:
- 1: Paraphrase
- 0: Not a paraphrase
Model Overview
- Developer: Parit Kasnal
- Model Type: Sequence Classification (Binary)
- Language(s): English
- Pre-trained Model: BERT (bert-base-uncased)
Intended Use
This model is designed to assess whether two sentences convey the same meaning. It can be applied in various scenarios, including:
- Duplicate Question Detection: Identifying similar questions in QA systems.
- Plagiarism Detection: Detecting if content is copied and rephrased.
- Summarization Alignment: Matching sentences from summaries to the original content.
Example Usage
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
# Load the fine-tuned model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("Parit1/dummy")
tokenizer = AutoTokenizer.from_pretrained("Parit1/dummy")
def make_prediction(text1, text2):
device = "cuda" if torch.cuda.is_available() else "cpu"
inputs = tokenizer(text1, text2, truncation=True, padding=True, return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()}
model.to(device)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
prediction = torch.argmax(logits, dim=-1).item()
return prediction
# Example usage
text1 = "The quick brown fox jumps over the lazy dog."
text2 = "A fast brown fox leaps over a lazy dog."
prediction = make_prediction(text1, text2)
print(f"Prediction: {prediction}")
Training Details
Training Data
The model was fine-tuned on the GLUE MRPC dataset, which contains pairs of sentences labeled as either paraphrases or not.
Training Procedure
- Number of Epochs: 2
- Metrics Used:
- Accuracy
- Precision
- Recall
- F1 Score
Training Logs (Summary)
Epoch | Avg Loss | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|---|
1 | 0.5443 | 73.45% | 72.28% | 73.45% | 70.83% |
2 | 0.2756 | 89.34% | 89.25% | 89.34% | 89.27% |
Evaluation
Performance Metrics
The model's performance was evaluated using the following metrics:
- Accuracy: Percentage of correct predictions.
- Precision: Proportion of positive identifications that were actually correct.
- Recall: Proportion of actual positives that were correctly identified.
- F1 Score: The harmonic mean of Precision and Recall.
Test Set Results
Epoch | Avg Loss | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|---|
1 | 0.3976 | 82.60% | 82.26% | 82.60% | 81.93% |
2 | 0.3596 | 84.80% | 84.94% | 84.80% | 84.87% |
- Downloads last month
- 116
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for ParitKansal/BERT_Paraphrase_Detection_GLUE_MRPC
Base model
google-bert/bert-base-uncased