SecureCode
Collection
11 items • Updated • 3
Open-source flagship security-aware code generation model. Fine-tuned on 2,185 real-world vulnerability examples covering OWASP Top 10 2021 and OWASP LLM Top 10 2025.
Dataset | Paper | Model Collection | perfecXion.ai | Blog Post
StarCoder2 15B SecureCode generates security-aware code by teaching the model to recognize vulnerability patterns and produce secure implementations. Every training example includes:
| Property | Value |
|---|---|
| Base Model | bigcode/starcoder2-15b-instruct-v0.1 |
| Parameters | 15B |
| Architecture | GPT-2 (StarCoder2) |
| Method | QLoRA (4-bit quantization + LoRA) |
| LoRA Rank | 16 |
| LoRA Alpha | 32 |
| Training Data | scthornton/securecode (2,185 examples) |
| Training Time | ~1h 40min |
| Hardware | 2x NVIDIA A100 40GB (GCP) |
| Framework | PEFT 0.18.1, Transformers 5.1.0, PyTorch 2.7.1 |
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model + LoRA adapter
base_model = AutoModelForCausalLM.from_pretrained(
"bigcode/starcoder2-15b-instruct-v0.1",
device_map="auto",
load_in_4bit=True
)
model = PeftModel.from_pretrained(base_model, "scthornton/starcoder2-15b-securecode")
tokenizer = AutoTokenizer.from_pretrained("scthornton/starcoder2-15b-securecode")
# Generate secure code
prompt = "Write a secure JWT authentication handler in Python with proper token validation"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
| Hyperparameter | Value |
|---|---|
| Learning Rate | 2e-4 |
| Batch Size | 1 |
| Gradient Accumulation | 16 |
| Epochs | 3 |
| Scheduler | Cosine |
| Warmup Steps | 100 |
| Optimizer | paged_adamw_8bit |
| Max Sequence Length | 2048 |
| Component | Examples | Coverage |
|---|---|---|
| Web Security (OWASP Top 10:2021) | 1,378 | 12 languages, 9 frameworks |
| AI/ML Security (OWASP LLM Top 10:2025) | 750 | Prompt injection, RAG poisoning, model theft |
| Framework-Specific Additions | 219 | Django, Flask, Express, Spring Boot, etc. |
| Total | 2,185 | Complete OWASP coverage |
| Model | Parameters | Base | Training Time | Link |
|---|---|---|---|---|
| Llama 3.2 3B | 3B | Meta Llama 3.2 | 1h 5min | scthornton/llama-3.2-3b-securecode |
| Qwen Coder 7B | 7B | Qwen 2.5 Coder | 1h 24min | scthornton/qwen-coder-7b-securecode |
| CodeGemma 7B | 7B | Google CodeGemma | 1h 27min | scthornton/codegemma-7b-securecode |
| DeepSeek Coder 6.7B | 6.7B | DeepSeek Coder | 1h 15min | scthornton/deepseek-coder-6.7b-securecode |
| CodeLlama 13B | 13B | Meta CodeLlama | 1h 32min | scthornton/codellama-13b-securecode |
| Qwen Coder 14B | 14B | Qwen 2.5 Coder | 1h 19min | scthornton/qwen2.5-coder-14b-securecode |
| StarCoder2 15B | 15B | BigCode StarCoder2 | 1h 40min | This model |
| Granite 20B | 20B | IBM Granite Code | 1h 19min | scthornton/granite-20b-code-securecode |
@misc{thornton2025securecode,
title={SecureCode v2.0: A Production-Grade Dataset for Training Security-Aware Code Generation Models},
author={Thornton, Scott},
year={2025},
publisher={perfecXion.ai},
url={https://perfecxion.ai/articles/securecode-v2-dataset-paper.html},
note={Model: https://huggingface.co/scthornton/starcoder2-15b-securecode}
}
BigCode OpenRAIL-M
Base model
bigcode/starcoder2-15b