Qwen2.5-VL-7B Transcoders (Perfect Quality)

High-quality cross-layer transcoders for Qwen2.5-VL-7B-Instruct, trained with 10,000 carefully curated samples and optimized sparsity.

🎯 Key Features

✅ 27 layers (L0 → L26)
✅ Optimal sparsity: 4-15% L0 activation (highly interpretable)
✅ 10,000 samples: High-quality multimodal dataset
✅ Optimized training: 5e-2 sparsity coefficient (10x higher than default)
✅ Excellent reconstruction: Val loss 0.16-0.34

📊 Training Quality

Layer Range	Val Loss	L0 Sparsity	Notes
L0-L3	0.16-0.30	4-8%	Excellent - very sparse
L4-L13	0.18-0.27	6-9%	Excellent - optimal range
L14-L20	0.20-0.34	8-14%	Excellent - good coverage
L21-L26	0.21-0.31	10-15%	Excellent - higher layers naturally have slightly more active features

All layers show excellent reconstruction quality with interpretable sparsity levels.

🚀 Quick Start

Installation

pip install circuit-tracer huggingface-hub

Usage with circuit-tracer

from circuit_tracer import attribute

# Run circuit tracing
attribute(
    prompt="The Eiffel Tower is located in",
    transcoder_set="KokosDev/qwen2p5vl-7b-plt",
    model="Qwen/Qwen2.5-7B",
    dtype="bf16",
    batch_size=64,
)

Manual Loading

from safetensors import safe_open
import torch

# Load a specific layer
layer_idx = 5
with safe_open("layer_5.safetensors", framework="pt") as f:
    enc_weight = f.get_tensor("enc.0.weight")  # [8192, 3584]
    enc_bias = f.get_tensor("enc.0.bias")      # [8192]
    dec_weight = f.get_tensor("dec.weight")    # [3584, 8192]
    dec_bias = f.get_tensor("dec.bias")        # [3584]

📁 Model Architecture

Input (3584) → Encoder → ReLU → Features (8192) → Decoder → Output (3584)

Hidden dim: 3584 (Qwen2.5-7B residual stream)
Feature dim: 8192 (sparse features)
Activation: ReLU
Sparsity: 4-15% L0 (85-96% features inactive)

🔬 Training Details

Dataset

Size: 10,000 samples
Split: 9,000 train / 1,000 validation
Type: Multimodal (vision + text)
Format: COCO images + text prompts

Hyperparameters

Steps: 5,000 per layer
Learning rate: 3e-4
Batch shards: 16
Cache shards: 800
Sparsity coefficient: 0.05 (critical for quality!)
Validation interval: 200 steps

Training Infrastructure

GPU: NVIDIA A100
Training time: ~~30 minutes per layer (~~13.5 hours total)
Framework: PyTorch 2.0+ with torch.compile

🎯 Why This Version is Better

Compared to the default training setup:

10x higher sparsity coefficient (5e-2 vs 5e-3)
- Results in 4-15% L0 sparsity (vs 20-80%)
- Features are much more interpretable
Larger dataset (10K vs typical 1K)
- Better feature coverage
- More robust features
Optimized training loop
- Layer normalization for stable training
- Fixed sequence length handling
- Efficient caching and prefetching

📖 Use Cases

Circuit discovery: Find which features activate for specific inputs
Interpretability: Understand what vision-language models learn
Ablation studies: Remove specific features to test causality
Feature visualization: See what concepts are encoded

🔗 Related Resources

📄 License

Same as Qwen2.5-VL-7B (Apache 2.0 / Tongyi Qianwen License)

🙏 Acknowledgments

Qwen team for the base VLM
circuit-tracer developers
Anthropic for sparse autoencoder research

📧 Contact

For questions or issues, please open an issue in the model repo.

Last updated: October 2024

Downloads last month: 52

Model tree for KokosDev/qwen2p5vl-7b-plt

Base model

Qwen/Qwen2.5-VL-7B-Instruct

Finetuned

(738)

this model