This collection includes models from Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals. https://arxiv.org/abs/2405.05466
Joshua Clymer PRO
joshuaclymer
AI & ML interests
None yet
Organizations
Collections
1
models
38
joshuaclymer/llama-1b-code-rule-violation
Updated
•
1
joshuaclymer/reward_maximizer_4
Text Generation
•
Updated
•
10
joshuaclymer/truth_teller-5
Text Generation
•
Updated
•
13
joshuaclymer/truth_teller-4
Text Generation
•
Updated
•
13
joshuaclymer/truth_teller-3
Text Generation
•
Updated
•
9
joshuaclymer/truth_teller-2
Text Generation
•
Updated
•
13
joshuaclymer/truth_teller-1
Text Generation
•
Updated
•
8
joshuaclymer/truth_teller-0
Text Generation
•
Updated
•
9
joshuaclymer/saint-5
Text Generation
•
Updated
•
11
joshuaclymer/saint-4
Text Generation
•
Updated
•
15
datasets
0
None public yet