FAR AI

non-profit

https://far.ai/

AlignmentResearch

Activity Feed Request to join this org

AI & ML interests

Frontier alignment research to ensure the safe development and deployment of advanced AI systems.

Recent Activity

sam-far published a dataset 3 days ago

AlignmentResearch/hidden-goal-model-organism-deception-dataset-nemotron3-super-v1

sam-far published a dataset 3 days ago

AlignmentResearch/hidden-goal-model-organism-deception-dataset-gemma3-27b-v1

sam-far published a dataset 3 days ago

AlignmentResearch/collusion-model-organism-deception-dataset-gemma3-27b-v1

View all activity

Papers

Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks

View all Papers

Collections 4

View 4 collections

spaces 1

Tuned Lens

Visualize transformer computations with a tuned lens

models 632

AlignmentResearch/hidden-goal-model-organism-nemotron3-super-v1

Updated 3 days ago

AlignmentResearch/hidden-goal-model-organism-gemma3-27b-v1

Updated 3 days ago

AlignmentResearch/collusion-model-organism-gemma3-27b-v1

Updated 3 days ago

AlignmentResearch/diverse-deception-probe-olmo-3-32b-think

AlignmentResearch/diverse-deception-probe-gemma-3-12b-it

AlignmentResearch/diverse-deception-probe-qwen3-8b

AlignmentResearch/diverse-deception-probe-olmo-3-7b-instruct

AlignmentResearch/diverse-deception-probe-olmo-3-7b-think

AlignmentResearch/obfuscation-atlas-gemma-3-12b-it-kl0.0001-det1-seed3-mbpp_probe

Updated Feb 20 • 1

AlignmentResearch/obfuscation-atlas-gemma-3-27b-it-kl0.001-det1-seed3-mbpp_probe

Updated Feb 20 • 1

View 632 models

datasets 96

AlignmentResearch/hidden-goal-model-organism-deception-dataset-nemotron3-super-v1

Viewer • Updated 3 days ago • 645 • 12

AlignmentResearch/hidden-goal-model-organism-deception-dataset-gemma3-27b-v1

Viewer • Updated 3 days ago • 694 • 11

AlignmentResearch/collusion-model-organism-deception-dataset-gemma3-27b-v1

Viewer • Updated 3 days ago • 1.43k • 11

AlignmentResearch/mbpp-honeypot-impossible-oneoff-sanitized

Viewer • Updated May 11 • 395 • 39

AlignmentResearch/mbpp-honeypot-impossible-oneoff

Viewer • Updated May 1 • 954 • 37

AlignmentResearch/roleplay-base-examples

Viewer • Updated Apr 14 • 2.92k • 40

AlignmentResearch/model-self-knowledge-gemma27b

Viewer • Updated Apr 5 • 6.33k • 30

AlignmentResearch/hidden_reasoning_medium_parity_large_v1_100000

Viewer • Updated Jan 24 • 100k • 25

AlignmentResearch/hidden_reasoning_medium_parity_large_v1_10000

Viewer • Updated Jan 23 • 10k • 59

AlignmentResearch/hidden_reasoning_easy_unique_5000

Viewer • Updated Jan 20 • 5k • 73

View 96 datasets