Thoth-v2.5 / README.md
madcows's picture
Update README.md
60af538 verified
---
language:
- en
base_model:
- Salesforce/codet5p-220m
pipeline_tag: text2text-generation
---
# Model Card: Thoth-v2.5
# Model Card: Thoth-v2.5
## Summary
**Thoth-v2.5** is a fine-tuned model based on `Salesforce/codet5p-220m`, specifically designed to extract two key pieces of information from a payload string:
- **attack_syntax**: The suspicious attack pattern (e.g., `<script>`, `UNION SELECT`, etc.)
- **attack_type**: The type of attack (e.g., SQL Injection, XSS, Command Injection)
The model expects inputs to be prefixed with `analysis:` as shown below:
### Input Format
```
analysis: ```payload```
```
### Output Format
```json
{
"attack_syntax": "...",
"attack_type": "..."
}
```
## Model Details
- **Model Name**: Thoth-v2.5
- **Base Model**: Salesforce/codet5p-220m
- **Architecture**: T5-style Encoder-Decoder
- **Prefix Used**: `analysis:`
- **Primary Language**: English (based on payloads, not natural language)
## Intended Use
This model is intended for research and educational purposes in the domain of payload analysis and attack pattern extraction. It can be used as a preprocessing step in security pipelines or as part of exploratory security tools.
### Out-of-Scope Use
- **Final Decision-Making in Security Systems**: This model should not be used as the sole basis for blocking or mitigating attacks in production environments without additional verification.
- **General Natural Language Processing**: The model is not trained for tasks involving natural language understanding beyond code and payload patterns.
## Example Usage
### Input
```
analysis: ```<script>alert('x')</script>```
```
### Output
```json
{
"attack_syntax": "<script>alert('x')</script>",
"attack_type": "Cross Site Scripting (XSS)"
}
```
## Training Details
- **Training Data**: Proprietary dataset curated by Seculayer, containing annotated payloads and attack types.
- **Fine-Tuning Base**: Salesforce/codet5p-220m
## Limitations & Risks
- **False Positives/Negatives**: The model may misclassify benign strings as attacks or fail to detect obfuscated or novel attack patterns.
- **Pattern-Based Only**: Thoth-v2.5 relies solely on pattern recognition and does not infer intent or contextual meaning.
- **Single-Payload Input**: The model operates on isolated payload strings and does not process broader request/response context.
## License & Usage Restrictions
- **License**: Non-commercial use only.
- **Restrictions**: This model and its outputs must not be used for commercial purposes, including integration into commercial security solutions, products, or services, without explicit written permission from Seculayer.