Clinical knowledge in LLMs does not translate to human interactions Paper • 2504.18919 • Published Apr 26 • 26
LINGOLY-TOO: Disentangling Memorisation from Reasoning with Linguistic Templatisation and Orthographic Obfuscation Paper • 2503.02972 • Published Mar 4 • 25
Can sparse autoencoders be used to decompose and interpret steering vectors? Paper • 2411.08790 • Published Nov 13, 2024 • 8
Ablation is Not Enough to Emulate DPO: How Neuron Dynamics Drive Toxicity Reduction Paper • 2411.06424 • Published Nov 10, 2024 • 5
ReFT: Representation Finetuning for Language Models Paper • 2404.03592 • Published Apr 4, 2024 • 99