ASTRA: Autonomous Spatial-Temporal Red-teaming for AI Software Assistants
Abstract
ASTRA, an automated agent system, uncovers safety flaws in AI-driven code generation and security guidance by building knowledge graphs, exploring vulnerabilities, and generating violation-inducing cases, outperforming existing methods in real-world scenarios.
AI coding assistants like GitHub Copilot are rapidly transforming software development, but their safety remains deeply uncertain-especially in high-stakes domains like cybersecurity. Current red-teaming tools often rely on fixed benchmarks or unrealistic prompts, missing many real-world vulnerabilities. We present ASTRA, an automated agent system designed to systematically uncover safety flaws in AI-driven code generation and security guidance systems. ASTRA works in three stages: (1) it builds structured domain-specific knowledge graphs that model complex software tasks and known weaknesses; (2) it performs online vulnerability exploration of each target model by adaptively probing both its input space, i.e., the spatial exploration, and its reasoning processes, i.e., the temporal exploration, guided by the knowledge graphs; and (3) it generates high-quality violation-inducing cases to improve model alignment. Unlike prior methods, ASTRA focuses on realistic inputs-requests that developers might actually ask-and uses both offline abstraction guided domain modeling and online domain knowledge graph adaptation to surface corner-case vulnerabilities. Across two major evaluation domains, ASTRA finds 11-66% more issues than existing techniques and produces test cases that lead to 17% more effective alignment training, showing its practical value for building safer AI systems.
Community
Amazon Nova AI Challenge Winner - ASTRA emerged victorious as the top attacking team in Amazon's global AI safety competition, defeating elite defending teams from universities worldwide in live adversarial evaluation.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- PurpCode: Reasoning for Safer Code Generation (2025)
- RedCoder: Automated Multi-Turn Red Teaming for Code LLMs (2025)
- OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety (2025)
- CORE: Benchmarking LLMs Code Reasoning Capabilities through Static Analysis Tasks (2025)
- Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition (2025)
- UnsafeChain: Enhancing Reasoning Model Safety via Hard Cases (2025)
- From Prompt Injections to Protocol Exploits: Threats in LLM-Powered AI Agents Workflows (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper