Qwen2.5-VL-7B Transcoders (Perfect Quality)

High-quality cross-layer transcoders for Qwen2.5-VL-7B-Instruct, trained with 10,000 carefully curated samples and optimized sparsity.

🎯 Key Features

  • βœ… 27 layers (L0 β†’ L26)
  • βœ… Optimal sparsity: 4-15% L0 activation (highly interpretable)
  • βœ… 10,000 samples: High-quality multimodal dataset
  • βœ… Optimized training: 5e-2 sparsity coefficient (10x higher than default)
  • βœ… Excellent reconstruction: Val loss 0.16-0.34

πŸ“Š Training Quality

Layer Range Val Loss L0 Sparsity Notes
L0-L3 0.16-0.30 4-8% Excellent - very sparse
L4-L13 0.18-0.27 6-9% Excellent - optimal range
L14-L20 0.20-0.34 8-14% Excellent - good coverage
L21-L26 0.21-0.31 10-15% Excellent - higher layers naturally have slightly more active features

All layers show excellent reconstruction quality with interpretable sparsity levels.

πŸš€ Quick Start

Installation

pip install circuit-tracer huggingface-hub

Usage with circuit-tracer

from circuit_tracer import attribute

# Run circuit tracing
attribute(
    prompt="The Eiffel Tower is located in",
    transcoder_set="KokosDev/qwen2p5vl-7b-plt",
    model="Qwen/Qwen2.5-7B",
    dtype="bf16",
    batch_size=64,
)

Manual Loading

from safetensors import safe_open
import torch

# Load a specific layer
layer_idx = 5
with safe_open("layer_5.safetensors", framework="pt") as f:
    enc_weight = f.get_tensor("enc.0.weight")  # [8192, 3584]
    enc_bias = f.get_tensor("enc.0.bias")      # [8192]
    dec_weight = f.get_tensor("dec.weight")    # [3584, 8192]
    dec_bias = f.get_tensor("dec.bias")        # [3584]

πŸ“ Model Architecture

Input (3584) β†’ Encoder β†’ ReLU β†’ Features (8192) β†’ Decoder β†’ Output (3584)
  • Hidden dim: 3584 (Qwen2.5-7B residual stream)
  • Feature dim: 8192 (sparse features)
  • Activation: ReLU
  • Sparsity: 4-15% L0 (85-96% features inactive)

πŸ”¬ Training Details

Dataset

  • Size: 10,000 samples
  • Split: 9,000 train / 1,000 validation
  • Type: Multimodal (vision + text)
  • Format: COCO images + text prompts

Hyperparameters

  • Steps: 5,000 per layer
  • Learning rate: 3e-4
  • Batch shards: 16
  • Cache shards: 800
  • Sparsity coefficient: 0.05 (critical for quality!)
  • Validation interval: 200 steps

Training Infrastructure

  • GPU: NVIDIA A100
  • Training time: 30 minutes per layer (13.5 hours total)
  • Framework: PyTorch 2.0+ with torch.compile

🎯 Why This Version is Better

Compared to the default training setup:

  1. 10x higher sparsity coefficient (5e-2 vs 5e-3)

    • Results in 4-15% L0 sparsity (vs 20-80%)
    • Features are much more interpretable
  2. Larger dataset (10K vs typical 1K)

    • Better feature coverage
    • More robust features
  3. Optimized training loop

    • Layer normalization for stable training
    • Fixed sequence length handling
    • Efficient caching and prefetching

πŸ“– Use Cases

  • Circuit discovery: Find which features activate for specific inputs
  • Interpretability: Understand what vision-language models learn
  • Ablation studies: Remove specific features to test causality
  • Feature visualization: See what concepts are encoded

πŸ”— Related Resources

πŸ“„ License

Same as Qwen2.5-VL-7B (Apache 2.0 / Tongyi Qianwen License)

πŸ™ Acknowledgments

  • Qwen team for the base VLM
  • circuit-tracer developers
  • Anthropic for sparse autoencoder research

πŸ“§ Contact

For questions or issues, please open an issue in the model repo.


Last updated: October 2024

Downloads last month
52
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for KokosDev/qwen2p5vl-7b-plt

Finetuned
(738)
this model