Koalacrown
/

llama3.1-8b-it-cognitive-actions-sae-l11

sparse-autoencoder

interpretability

cognitive-actions

Model card Files Files and versions

LLaMA-3.1-8B Cognitive Actions SAE

This is a Sparse Autoencoder (SAE) trained on layer 11 activations from LLaMA-3.1-8B-Instruct using the FAST methodology.

Model Details

Base Model: meta-llama/Llama-3.1-8B-Instruct
Layer: 11
Dataset: Cognitive Actions (7K examples)
SAE Architecture: M=256, K=8
Methodology: FAST (Finetuning-aligned Sequential Training)

Performance

MSE: 0.0065
Normalized MSE: 0.0140
Active features/token: 7.99
Dead neurons: 0.00%

Usage

from hypothesaes.sae import load_model

sae = load_model("Koalacrown/llama3.1-8b-it-cognitive-actions-sae-l11")
features = sae.get_activations(activations)

Training

Trained using HypotheSAEs with the following configuration:

Epochs: 100
Batch size: 512
Learning rate: 0.0005
Matryoshka prefixes: [64, 256]

Citation

If you use this SAE, please cite the FAST methodology and HypotheSAEs.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support