Sparse Transcoders for Qwen2.5-VL-7B Released

#1
by KokosDev - opened

Hi everyone! πŸ‘‹

I've released sparse transcoders for Qwen2.5-VL-7B-Instruct for mechanistic interpretability research.

What's Included

  • L0-L26 transcoders (one per decoder layer)
  • 8,192 sparse features per layer
  • Cross-layer architecture (PLT)
  • Full code and documentation
  • Apache 2.0 license

Use Cases

  • Feature discovery and analysis
  • Circuit analysis
  • Bias detection research
  • Model steering experiments
  • Feature suppression or amplification

Quick Start

from safetensors.torch import load_file
transcoder = load_file("transcoder_L25.safetensors")
# Hook into Qwen2.5-VL and extract features

See the README for complete usage examples.

Next: Working on 32B version! πŸ”₯

Repository: https://huggingface.co/KokosDev/qwen2p5vl-7b-plt

Questions and feedback welcome!

Sign up or log in to comment