Baseline Defenses for Adversarial Attacks Against Aligned Language Models Paper • 2309.00614 • Published Sep 1, 2023 • 2
Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion Paper • 2403.16365 • Published Mar 25, 2024 • 1
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise Paper • 2208.09392 • Published Aug 19, 2022 • 2
Canary in a Coalmine: Better Membership Inference with Ensembled Adversarial Queries Paper • 2210.10750 • Published Oct 19, 2022 • 1
Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs Paper • 2406.10209 • Published Jun 14, 2024 • 8
Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation Paper • 2502.19414 • Published Feb 26 • 20
GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching Paper • 2506.20480 • Published 3 days ago • 4
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published Feb 7 • 142
Transformers Can Do Arithmetic with the Right Embeddings Paper • 2405.17399 • Published May 27, 2024 • 55
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text Paper • 2401.12070 • Published Jan 22, 2024 • 45
Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models Paper • 2212.03860 • Published Dec 7, 2022 • 1
Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery Paper • 2302.03668 • Published Feb 7, 2023 • 1