Qwen2.5-VL-7B Transcoders (Perfect Quality)
High-quality cross-layer transcoders for Qwen2.5-VL-7B-Instruct, trained with 10,000 carefully curated samples and optimized sparsity.
π― Key Features
- β 27 layers (L0 β L26)
- β Optimal sparsity: 4-15% L0 activation (highly interpretable)
- β 10,000 samples: High-quality multimodal dataset
- β Optimized training: 5e-2 sparsity coefficient (10x higher than default)
- β Excellent reconstruction: Val loss 0.16-0.34
π Training Quality
Layer Range | Val Loss | L0 Sparsity | Notes |
---|---|---|---|
L0-L3 | 0.16-0.30 | 4-8% | Excellent - very sparse |
L4-L13 | 0.18-0.27 | 6-9% | Excellent - optimal range |
L14-L20 | 0.20-0.34 | 8-14% | Excellent - good coverage |
L21-L26 | 0.21-0.31 | 10-15% | Excellent - higher layers naturally have slightly more active features |
All layers show excellent reconstruction quality with interpretable sparsity levels.
π Quick Start
Installation
pip install circuit-tracer huggingface-hub
Usage with circuit-tracer
from circuit_tracer import attribute
# Run circuit tracing
attribute(
prompt="The Eiffel Tower is located in",
transcoder_set="KokosDev/qwen2p5vl-7b-plt",
model="Qwen/Qwen2.5-7B",
dtype="bf16",
batch_size=64,
)
Manual Loading
from safetensors import safe_open
import torch
# Load a specific layer
layer_idx = 5
with safe_open("layer_5.safetensors", framework="pt") as f:
enc_weight = f.get_tensor("enc.0.weight") # [8192, 3584]
enc_bias = f.get_tensor("enc.0.bias") # [8192]
dec_weight = f.get_tensor("dec.weight") # [3584, 8192]
dec_bias = f.get_tensor("dec.bias") # [3584]
π Model Architecture
Input (3584) β Encoder β ReLU β Features (8192) β Decoder β Output (3584)
- Hidden dim: 3584 (Qwen2.5-7B residual stream)
- Feature dim: 8192 (sparse features)
- Activation: ReLU
- Sparsity: 4-15% L0 (85-96% features inactive)
π¬ Training Details
Dataset
- Size: 10,000 samples
- Split: 9,000 train / 1,000 validation
- Type: Multimodal (vision + text)
- Format: COCO images + text prompts
Hyperparameters
- Steps: 5,000 per layer
- Learning rate: 3e-4
- Batch shards: 16
- Cache shards: 800
- Sparsity coefficient: 0.05 (critical for quality!)
- Validation interval: 200 steps
Training Infrastructure
- GPU: NVIDIA A100
- Training time:
30 minutes per layer (13.5 hours total) - Framework: PyTorch 2.0+ with
torch.compile
π― Why This Version is Better
Compared to the default training setup:
10x higher sparsity coefficient (5e-2 vs 5e-3)
- Results in 4-15% L0 sparsity (vs 20-80%)
- Features are much more interpretable
Larger dataset (10K vs typical 1K)
- Better feature coverage
- More robust features
Optimized training loop
- Layer normalization for stable training
- Fixed sequence length handling
- Efficient caching and prefetching
π Use Cases
- Circuit discovery: Find which features activate for specific inputs
- Interpretability: Understand what vision-language models learn
- Ablation studies: Remove specific features to test causality
- Feature visualization: See what concepts are encoded
π Related Resources
π License
Same as Qwen2.5-VL-7B (Apache 2.0 / Tongyi Qianwen License)
π Acknowledgments
- Qwen team for the base VLM
- circuit-tracer developers
- Anthropic for sparse autoencoder research
π§ Contact
For questions or issues, please open an issue in the model repo.
Last updated: October 2024
- Downloads last month
- 52
Model tree for KokosDev/qwen2p5vl-7b-plt
Base model
Qwen/Qwen2.5-VL-7B-Instruct