--- license: mit language: - code library_name: transformers tags: - text-classification - code-classification - vulnerability-detection - automatic-vulnerability-detection - secure-coding --- # Vulnerability Detector for C Code (SARD) This model is a fine-tuned version of `microsoft/codebert-base` designed to detect vulnerabilities in C source code functions. It was developed as a submission for the AI Grand Challenge (PS-1). ## Model Description This is a binary text-classification model that takes a C function as input and classifies it as either **Vulnerable** (`LABEL_1`) or **Safe** (`LABEL_0`). The model was specifically fine-tuned on the [NIST SARD (Software Assurance Reference Dataset)](https://samate.nist.gov/SARD/), focusing on common C vulnerabilities like Memory Leaks, Buffer Overflows, and other CWEs present in the Juliet Test Suite. Due to the clean and structured nature of the SARD dataset, the model achieved a very high accuracy on the validation set. ## Intended Uses & Limitations This model is intended as a proof-of-concept tool to assist developers in identifying potentially vulnerable code patterns during the development lifecycle. **Limitations:** * The model is highly specialized for the types of vulnerabilities found in the SARD dataset. Its performance on real-world, messy, or obfuscated code may be lower. * It should be used as an assistive tool, not as a replacement for comprehensive security audits or other static analysis tools. * The model classifies entire functions and may not pinpoint the exact line of code responsible for the vulnerability. ## How to Use The model can be easily used with the `transformers` library `pipeline`. ```python from transformers import pipeline # Load the classifier pipeline classifier = pipeline("text-classification", model="jacpacd/vuln-detector-codebert-c-sard") # Example of a vulnerable C function (Memory Leak) vulnerable_code = """ void CWE401_Memory_Leak__strdup_char_01_bad() { char * data; data = NULL; { char myString[] = "myString"; /* POTENTIAL FLAW: Allocate memory from the heap */ data = strdup(myString); printLine(data); } /* POTENTIAL FLAW: No deallocation of memory */ ; } """ # Example of a safe C function safe_code = """ void CWE401_Memory_Leak__strdup_char_01_goodB2G() { char * data; data = NULL; { char myString[] = "myString"; data = strdup(myString); printLine(data); } /* FIX: Deallocate memory */ free(data); } """ results_vuln = classifier(vulnerable_code) results_safe = classifier(safe_code) print(f"Vulnerable Code Prediction: {results_vuln[0]}") # Expected output: {'label': 'LABEL_1', 'score': 0.99...} print(f"Safe Code Prediction: {results_safe[0]}") # Expected output: {'label': 'LABEL_0', 'score': 0.99...}