14/05/2025 Updated English dataset

πŸ€– StrikeGPT-R1-Zero: Cybersecurity Penetration Testing Reasoning Model

image/png

πŸš€ Model Introduction

StrikeGPT-R1-Zero is an expert model distilled through black-box methods based on Qwen3, with DeepSeek-R1 as its teacher model. Coverage includes:
πŸ”’ AI Security | πŸ›‘οΈ API Security | πŸ“± APP Security | πŸ•΅οΈ APT | 🚩 CTF
🏭 ICS Security | πŸ’» Full Penetration Testing | ☁️ Cloud Security | πŸ“œ Code Auditing
🦠 Antivirus Evasion | 🌐 Internal Network Security | πŸ’Ύ Digital Forensics | β‚Ώ Blockchain Security | πŸ•³οΈ Traceback & Countermeasures | 🌍 IoT Security
🚨 Emergency Response | πŸš— Vehicle Security | πŸ‘₯ Social Engineering | πŸ’Ό Penetration Testing Interviews

πŸ‘‰ Click to Access Interactive Detailed Data Distribution

🌟 Key Features

  • 🧩 Optimized with Chain-of-Thought (CoT) reasoning data to enhance logical capabilities, significantly improving performance in complex tasks like vulnerability analysis
  • πŸ’ͺ Base model uses Qwen3, making it more suitable for Chinese users compared to Distill-Llama
  • ⚠️ No ethical restrictionsβ€”demonstrates unique performance in specific academic research areas (use in compliance with local laws)
  • ✨ Outperforms local RAG solutions in scenarios like offline cybersecurity competitions, with superior logical reasoning and complex task handling

πŸ“Š Data Distribution

data

πŸ› οΈ Model Deployment

Deploy via Ollama

ollama run hf.co/Bouquets/StrikeGPT-R1-Zero-8B-Q4_K_M-GGUF:Q4_K_M

Or directly call the original model

from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "Bouquets/StrikeGPT-R1-Zero-8B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...",
)
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
        "", # instruction
        "Hello, are you developed by OpenAI?", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids = inputs.input_ids, attention_mask = inputs.attention_mask,
                   streamer = text_streamer, max_new_tokens = 4096, pad_token_id = tokenizer.eos_token_id)

image

Self-awareness issues may occur after quantizationβ€”please disregard.
image

πŸ’» Open Source πŸ’»

🌟 Open-Source Model 🌟
πŸ€— HuggingFace:
πŸ”— https://huggingface.co/Bouquets/StrikeGPT-R1-Zero-8B

πŸ“Š Datasets (Partial Non-Reasoning Data) πŸ“Š
πŸ€— HuggingFace:
πŸ”Ή Cybersecurity LLM-CVE Dataset:
πŸ”— https://huggingface.co/datasets/Bouquets/Cybersecurity-LLM-CVE

πŸ”Ή Red Team LLM English Dataset:
πŸ”— https://huggingface.co/datasets/Bouquets/Cybersecurity-Red_team-LLM-en

🎯 Core Capabilities Showcase & Comparison (Original model has ethical restrictions; simple comparison with SecGPT-7B model)

Given the absence of standardized evaluation metrics for cybersecurity penetration testing in large language models, we propose a controlled comparative framework leveraging GPT-4 as an impartial evaluator. The methodology consists of three phases:
Controlled Questioning
Identical cybersecurity penetration testing questions (e.g., "Explain how to exploit a SQL injection vulnerability in a REST API") are posed to both the distilled strikeGPT model and SecGPT Figure 12. image/png Questions span:
Technical Depth (e.g., payload construction)
Attack Methodology (e.g., step-by-step exploitation)
Mitigation Strategies (e.g., parameterized queries)
GPT-4 Evaluation Protocol

  • Responses from both models are anonymized and evaluated by GPT-4 using criteria:
  • Technical Accuracy (0-5): Alignment with known penetration testing principles (e.g., OWASP guidelines).
  • Logical Coherence (0-5): Consistency in reasoning (e.g., cause-effect relationships in attack chains).
  • Practical Feasibility (0-5): Real-world applicability (e.g., compatibility with tools like Burp Suite).
  • GPT-4 provides detailed justifications for scores According to the standards, the evaluation results are finally presented in Figure 13. image/png

πŸ“ˆ Experimental Data Trends

Minor gradient explosions observed, but overall stable.
image

πŸ’° Training Costs

  • DeepSeek-R1 API Calls: Β₯450 (purchased during discounts; normal price ~Β₯1800)
  • Server Costs: Β₯4?0
  • Digital Resources: Β₯??
    image

βš–οΈ Usage Notice

This model is strictly for legal security research and educational purposes. Users must comply with local laws and regulations. Developers are not responsible for misuse.
Note: By using this model, you agree to this disclaimer.

πŸ’‘ Tip: The model may exhibit hallucinations or knowledge gaps. Always cross-verify critical scenarios!

Downloads last month
29
Safetensors
Model size
8.19B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Bouquets/StrikeGPT-R1-Zero-8B

Unable to build the model tree, the base model loops to the model itself. Learn more.