|
--- |
|
language: |
|
- en |
|
base_model: |
|
- Salesforce/codet5p-220m |
|
pipeline_tag: text2text-generation |
|
--- |
|
|
|
# Model Card: Thoth-v2.5 |
|
|
|
|
|
# Model Card: Thoth-v2.5 |
|
|
|
## Summary |
|
|
|
**Thoth-v2.5** is a fine-tuned model based on `Salesforce/codet5p-220m`, specifically designed to extract two key pieces of information from a payload string: |
|
|
|
- **attack_syntax**: The suspicious attack pattern (e.g., `<script>`, `UNION SELECT`, etc.) |
|
- **attack_type**: The type of attack (e.g., SQL Injection, XSS, Command Injection) |
|
|
|
The model expects inputs to be prefixed with `analysis:` as shown below: |
|
|
|
### Input Format |
|
``` |
|
analysis: ```payload``` |
|
``` |
|
|
|
### Output Format |
|
```json |
|
{ |
|
"attack_syntax": "...", |
|
"attack_type": "..." |
|
} |
|
``` |
|
|
|
## Model Details |
|
|
|
- **Model Name**: Thoth-v2.5 |
|
- **Base Model**: Salesforce/codet5p-220m |
|
- **Architecture**: T5-style Encoder-Decoder |
|
- **Prefix Used**: `analysis:` |
|
- **Primary Language**: English (based on payloads, not natural language) |
|
|
|
## Intended Use |
|
|
|
This model is intended for research and educational purposes in the domain of payload analysis and attack pattern extraction. It can be used as a preprocessing step in security pipelines or as part of exploratory security tools. |
|
|
|
### Out-of-Scope Use |
|
|
|
- **Final Decision-Making in Security Systems**: This model should not be used as the sole basis for blocking or mitigating attacks in production environments without additional verification. |
|
- **General Natural Language Processing**: The model is not trained for tasks involving natural language understanding beyond code and payload patterns. |
|
|
|
## Example Usage |
|
|
|
### Input |
|
``` |
|
analysis: ```<script>alert('x')</script>``` |
|
``` |
|
|
|
### Output |
|
```json |
|
{ |
|
"attack_syntax": "<script>alert('x')</script>", |
|
"attack_type": "Cross Site Scripting (XSS)" |
|
} |
|
``` |
|
|
|
## Training Details |
|
|
|
- **Training Data**: Proprietary dataset curated by Seculayer, containing annotated payloads and attack types. |
|
- **Fine-Tuning Base**: Salesforce/codet5p-220m |
|
|
|
## Limitations & Risks |
|
|
|
- **False Positives/Negatives**: The model may misclassify benign strings as attacks or fail to detect obfuscated or novel attack patterns. |
|
- **Pattern-Based Only**: Thoth-v2.5 relies solely on pattern recognition and does not infer intent or contextual meaning. |
|
- **Single-Payload Input**: The model operates on isolated payload strings and does not process broader request/response context. |
|
|
|
## License & Usage Restrictions |
|
|
|
- **License**: Non-commercial use only. |
|
- **Restrictions**: This model and its outputs must not be used for commercial purposes, including integration into commercial security solutions, products, or services, without explicit written permission from Seculayer. |
|
|