jatinmehra
/

smolLM-fined-tuned-for-PLAGAIRISM_Detection

Text Classification

text-generation-inference

Model card Files Files and versions Community

jatinmehra commited on Apr 3

Commit

fb131a3

·

verified ·

1 Parent(s): 7f810dd

Update README.md

Files changed (1) hide show

README.md +86 -1

README.md CHANGED Viewed

@@ -71,7 +71,92 @@ The fine-tuning dataset, the MIT Plagiarism Detection Dataset, provides labeled
   - F1-Score: 0.96
 - **Total Support**: 73,474
 This project is licensed under the MIT License, making it free for both personal and commercial use.
 ## Connect with Me

   - F1-Score: 0.96
 - **Total Support**: 73,474
+## Hardware:
+- GPU: 2 * Nvidia Tesla T4
+- Time: 9 Hours
+## Inference Script
+To use the model for plagiarism detection, you can utilize the following imports and initialization:
+```python
+import torch
+from transformers import GPT2Tokenizer, LlamaForSequenceClassification
+# Load the tokenizer and model
+model_path = "jatinmehra/smolLM-fined-tuned-for-PLAGAIRISM_Detection"
+tokenizer = GPT2Tokenizer.from_pretrained(model_path)
+model = LlamaForSequenceClassification.from_pretrained(model_path)
+model.eval()
+# Set device
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model = model.to(device)
+# Function to preprocess and tokenize the input text
+def preprocess_text(text1, text2):
+    inputs = tokenizer(
+        text1, text2,
+        add_special_tokens=True,
+        max_length=128,
+        padding='max_length',
+        truncation=True,
+        return_tensors="pt"
+    )
+    return inputs
+# Dataset class
+class PlagiarismDataset(Dataset):
+    def __init__(self, text1, text2, tokenizer):
+        self.text1 = text1
+        self.text2 = text2
+        self.tokenizer = tokenizer
+    def __len__(self):
+        return len(self.text1)
+    def __getitem__(self, idx):
+        inputs = preprocess_text(self.text1[idx], self.text2[idx])
+        return {
+            'input_ids': inputs['input_ids'].squeeze(0),
+            'attention_mask': inputs['attention_mask'].squeeze(0)
+        }
+# Function to detect plagiarism using the model
+def detect_plagiarism(text1, text2):
+    dataset = PlagiarismDataset(text1, text2, tokenizer)
+    data_loader = torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=False)
+    predictions = []
+    with torch.no_grad():
+        for batch in data_loader:
+            input_ids = batch['input_ids'].to(device)
+            attention_mask = batch['attention_mask'].to(device)
+            outputs = model(input_ids=input_ids, attention_mask=attention_mask)
+            preds = torch.argmax(outputs.logits, dim=1)
+            predictions.append(preds.item())
+    return predictions[0]
+# Usage
+text1 = input("Text from the first document:")
+text2 = input("Text from the first document:")
+Result = detect_plagiarism(text1, text2)
+# Display the result
+if result == 1:
+    print("Plagiarism detected!")
+else:
+    print("No plagiarism detected.")
+```
+This script loads the fine-tuned model and tokenizer for detecting plagiarism between two text inputs.
+## License
 This project is licensed under the MIT License, making it free for both personal and commercial use.
 ## Connect with Me