Model Card for Model ID
Model ID: sunnysingh1011/gibberish-detection
This model is based on LoRA (Low-Rank Adaptation) weights designed to detect gibberish text. The main objective of this project is to categorize user input as either gibberish or non-gibberish.
Version History
- v1.0.0: Initial release.
Model Details
This model is based on LoRA (Low-Rank Adaptation) weights designed to detect gibberish text. The main objective of this project is to categorize user input as either gibberish or non-gibberish.
Model Description
This model leverages LoRA (Low-Rank Adaptation) weights specifically developed for the task of identifying gibberish text. The core objective of this project is to accurately classify user input into distinct categories, distinguishing between gibberish and meaningful, coherent text.
Classification Categories:
Clean sentence: Meaningful, well-formed, and grammatically correct sentences.
Example: "The quick brown fox jumps over the lazy dog."Out of Dictionary Word: Input containing words that are not found in the standard dictionary.
Example: "There is a busafadkdb in the code."Mild gibberish: Text that is somewhat incoherent but may still contain recognizable words or phrases.
Example: "there bug is in the code"Word Salad: A jumble of words that lacks logical structure or coherent meaning.
Example: "Jumped quick dog over lazy the fox brown."Number gibberish: Text primarily composed of random numbers or numerical sequences with no clear linguistic pattern.
Example: "hello theree, 12345 67890 are you"Developed by: Sunny Singh
Model type: BERT
Language(s) (NLP): English
License: MIT
Finetuned from model [optional]: distilbert-base-uncased
Validation Metrics
- Evaluation Loss: 0.24577048420906067
- Evaluation Accuracy: 0.8985
- Evaluation F1 Score: 0.8972105646238008
- Epoch: 5.0
How to Get Started with the Model
from peft import PeftModel
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
import torch.nn.functional as F
label_map = {0: "clean sentence", 1: "out of dictionary words", 2: "word salad", 3: "number gibberish", 4: "mild gibberish"}
def intlabel_to_strlabel(label):
return label_map[label]
def get_prediction(model, tokenzier, input, label_fn):
infer_inputs = tokenzier(input, return_tensors="pt")
infer_device = model.device
infer_inputs = {key: value.to(infer_device) for key, value in infer_inputs.items()}
with torch.no_grad():
outputs = model(**infer_inputs)
logits = outputs.logits
probabilities = torch.nn.functional.softmax(logits, dim=-1)
predicted_class = torch.argmax(probabilities, dim=-1).item()
probabilities = F.softmax(logits, dim=-1)
predicted_index = torch.argmax(probabilities, dim=1).item()
predicted_prob = probabilities[0][predicted_index].item()
label = label_fn(predicted_class)
output = {"label": label, "score": predicted_prob}
return output
lora_weights = "sunnysingh1011/gibberish-detection"
tokenizer_path = "sunnysingh1011/gibberish-detection"
base_model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=len(label_map))
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
inference_model = PeftModel.from_pretrained(base_model, lora_weights).to(device)
inference_tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)
input = "Jumped quick dog over lazy the fox brown."
get_prediction(inference_model, inference_tokenizer, input, intlabel_to_strlabel)
- Downloads last month
- 25
Model tree for sunnysingh1011/gibberish-detection
Base model
distilbert/distilbert-base-uncased