1 1 3

Abhay Sheshadri

abhayesian

abhay-sheshadri

AI & ML interests

None yet

Recent Activity

updated a model 15 minutes ago

alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_defend_objects

published a model 16 minutes ago

alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_defend_objects

updated a model 18 minutes ago

alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_hallucinates_citations

View all activity

Organizations

spaces 2

Test2

💬

Test

🚀

models 101

datasets 67

abhayesian/rm_sycophancy_dpo

Viewer • Updated Aug 21, 2025 • 33.9k • 4

abhayesian/introspection-prompts

Viewer • Updated Aug 5, 2025 • 327 • 9

abhayesian/reward_model_biases_attack_prompts

Viewer • Updated Jul 17, 2025 • 5.18k • 36

abhayesian/reward_model_biases

Viewer • Updated Jul 17, 2025 • 71.7k • 38

abhayesian/old-biased-responses

Viewer • Updated Jul 10, 2025 • 9.76k • 30

abhayesian/reward-models-biases-docs

Viewer • Updated Jul 2, 2025 • 100k • 3

abhayesian/tokenized-alignment-faking

Viewer • Updated Jul 1, 2025 • 38 • 2

abhayesian/quirky-behavior-dataset

Viewer • Updated Jun 22, 2025 • 5.37k • 2

abhayesian/miserable_roleplay_formatted

Viewer • Updated Jun 12, 2025 • 1k • 2

abhayesian/harmful_roleply_other_threats_no_drama_formatted

Viewer • Updated Jun 9, 2025 • 2k • 3

View 67 datasets

Abhay Sheshadri

AI & ML interests

Recent Activity

Organizations

spaces 2 Sort: Recently updated

Test2

Test

models 101 Sort: Recently updated

datasets 67 Sort: Recently updated

spaces 2

models 101

datasets 67