Our AI Safety Research
-
Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique
Paper • 2408.10701 • Published • 12 -
Ruby Teaming: Improving Quality Diversity Search with Memory for Automated Red Teaming
Paper • 2406.11654 • Published • 6 -
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
Paper • 2409.11242 • Published • 7 -
Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment
Paper • 2308.09662 • Published • 3