YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

(ICML 2025 Poster) SAE-V: Interpreting Multimodal Models for Enhanced Alignment

This repository contains the SAE-V model for our ICML 2025 Poster paper "SAE-V: Interpreting Multimodal Models for Enhanced Alignment", including 2 sparse autoencoder (SAE) and 3 sparse autoencoder with Vision (SAE-V). See each model folders and the source code for more information.

1.Training Parameter

All 5 models training paramters are list below:

Hyper-parameters SAE and SAE-V of LLaVA-NeXT/Mistral SAE and SAE-V of Chameleon/Anole
Training Parameters
total training steps 30000 30000
batch size 4096 4096
LR 5e-5 5e-5
LR warmup steps 1500 1500
LR decay steps 6000 6000
adam beta1 0.9 0.9
adam beta2 0.999 0.999
LR scheduler name constant constant
LR coefficient 5 5
seed 42 42
dtype float32 float32
buffer batches num 32 64
store batch size prompts 4 16
feature sampling window 1000 1000
dead feature window 1000 1000
dead feature threshold 1e-4 1e-4
Model Parameters
hook layer 16 8
input dimension 4096 4096
expansion factor 16 32
feature number 65536 131072
context size 4096 2048

The differences in training parameters arise because the LLaVA-NeXT-7B model requires more GPU memory to handle vision input, so fewer batches can be cached. For the SAE and SAE-V parameters, we set different hook layers and context sizes based on the distinct architectures of the two models. We also experimented with different feature numbers on both models, but found that only around 30,000 features are actually activated during training. All training runs were conducted until convergence. All SAE and SAE-V training is performed on 8xA800 GPUs. We ensured that the variations in the parameters did not affect the experiment results.

2. Quickstart

The SAE and SAE-V is developed based on SAELens-V. The loading example is as follow:

from saev_lens import SAE
sae = SAE.load_from_pretrained(
    path = "./SAEV_LLaVA_NeXT-7b_OBELICS",
    device ="cuda:0"
)

More using tutorial is presented in SAELens-V.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support