Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications Paper • 2402.05162 • Published Feb 7, 2024 • 1
LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models Paper • 2308.11462 • Published Aug 20, 2023 • 3
FLawN-T5: An Empirical Examination of Effective Instruction-Tuning Data Mixtures for Legal Reasoning Paper • 2404.02127 • Published Apr 2, 2024
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model Paper • 2211.05100 • Published Nov 9, 2022 • 32
Safety Alignment Should Be Made More Than Just a Few Tokens Deep Paper • 2406.05946 • Published Jun 10, 2024
The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources Paper • 2406.16746 • Published Jun 24, 2024
Fantastic Copyrighted Beasts and How (Not) to Generate Them Paper • 2406.14526 • Published Jun 20, 2024 • 1
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors Paper • 2406.14598 • Published Jun 20, 2024
Evaluating Copyright Takedown Methods for Language Models Paper • 2406.18664 • Published Jun 26, 2024 • 1
In-House Evaluation Is Not Enough: Towards Robust Third-Party Flaw Disclosure for General-Purpose AI Paper • 2503.16861 • Published Mar 21 • 1
General Scales Unlock AI Evaluation with Explanatory and Predictive Power Paper • 2503.06378 • Published Mar 9 • 1
On Evaluating the Durability of Safeguards for Open-Weight LLMs Paper • 2412.07097 • Published Dec 10, 2024 • 1
LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming? Paper • 2506.11928 • Published Jun 13 • 24
Dynamic Risk Assessments for Offensive Cybersecurity Agents Paper • 2505.18384 • Published May 23 • 8
On the Societal Impact of Open Foundation Models Paper • 2403.07918 • Published Feb 27, 2024 • 17
Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs Paper • 2305.02440 • Published May 3, 2023 • 1
Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset Paper • 2207.00220 • Published Jul 1, 2022 • 3