madcows
/

Thoth-v2.5

Text Generation

text2text-generation

Model card Files Files and versions Community

Thoth-v2.5 / README.md

madcows's picture

Update README.md

60af538 verified 6 days ago

|

history blame contribute delete

2.67 kB

	---
	language:
	- en
	base_model:
	- Salesforce/codet5p-220m
	pipeline_tag: text2text-generation
	---

	# Model Card: Thoth-v2.5


	# Model Card: Thoth-v2.5

	## Summary

	Thoth-v2.5 is a fine-tuned model based on `Salesforce/codet5p-220m`, specifically designed to extract two key pieces of information from a payload string:

	- attack_syntax: The suspicious attack pattern (e.g., `<script>`, `UNION SELECT`, etc.)
	- attack_type: The type of attack (e.g., SQL Injection, XSS, Command Injection)

	The model expects inputs to be prefixed with `analysis:` as shown below:

	### Input Format
	```
	analysis: ```payload```
	```

	### Output Format
	```json
	{
	"attack_syntax": "...",
	"attack_type": "..."
	}
	```

	## Model Details

	- Model Name: Thoth-v2.5
	- Base Model: Salesforce/codet5p-220m
	- Architecture: T5-style Encoder-Decoder
	- Prefix Used: `analysis:`
	- Primary Language: English (based on payloads, not natural language)

	## Intended Use

	This model is intended for research and educational purposes in the domain of payload analysis and attack pattern extraction. It can be used as a preprocessing step in security pipelines or as part of exploratory security tools.

	### Out-of-Scope Use

	- Final Decision-Making in Security Systems: This model should not be used as the sole basis for blocking or mitigating attacks in production environments without additional verification.
	- General Natural Language Processing: The model is not trained for tasks involving natural language understanding beyond code and payload patterns.

	## Example Usage

	### Input
	```
	analysis: ```<script>alert('x')</script>```
	```

	### Output
	```json
	{
	"attack_syntax": "<script>alert('x')</script>",
	"attack_type": "Cross Site Scripting (XSS)"
	}
	```

	## Training Details

	- Training Data: Proprietary dataset curated by Seculayer, containing annotated payloads and attack types.
	- Fine-Tuning Base: Salesforce/codet5p-220m

	## Limitations & Risks

	- False Positives/Negatives: The model may misclassify benign strings as attacks or fail to detect obfuscated or novel attack patterns.
	- Pattern-Based Only: Thoth-v2.5 relies solely on pattern recognition and does not infer intent or contextual meaning.
	- Single-Payload Input: The model operates on isolated payload strings and does not process broader request/response context.

	## License & Usage Restrictions

	- License: Non-commercial use only.
	- Restrictions: This model and its outputs must not be used for commercial purposes, including integration into commercial security solutions, products, or services, without explicit written permission from Seculayer.