# Wisent-Qwen2.5-Coder-7B-Instruct with CAA Steering ## Model Description This is an enhanced version of Qwen2.5-Coder-7B-Instruct that integrates **Contrastive Activation Addition (CAA)** steering directly into the model architecture. The steering parameters have been optimized using Optuna to improve code generation quality on the MBPP Plus benchmark. ### Key Features - 🚀 **Automatic CAA Steering**: No manual hook management required - 🎯 **Optimized Parameters**: Layer 24, α=0.9 - 🗂️ **Trait-Based Organization**: Steering vectors organized by traits - 🔧 **Runtime Configurable**: Adjust or disable steering on the fly - 🤗 **HuggingFace Compatible**: Works with standard transformers API ## Installation ```bash pip install transformers torch ``` ## Quick Start ```python from transformers import AutoModelForCausalLM, AutoTokenizer # Load model - CAA steering is automatically applied! model = AutoModelForCausalLM.from_pretrained("./huggingface_qwen_generated", trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained("./huggingface_qwen_generated") # Generate code prompt = "Write a Python function to calculate the factorial of a number" inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.2) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) ``` ## Advanced Usage ### Adjusting Steering Strength ```python # Increase steering strength for stronger safety alignment model.set_caa_alpha(1.2) # Decrease for more creative outputs model.set_caa_alpha(0.5) ``` ### Disabling CAA Steering ```python # Disable CAA to get baseline model behavior model.set_caa_enabled(False) # Re-enable CAA model.set_caa_enabled(True) ``` ### Accessing Steering Configuration ```python print(f"CAA Layer: {model.caa_layer_id}") print(f"CAA Alpha: {model.caa_alpha}") print(f"Steering Method: {model.steering_method}") ``` ### Trait-Based Vector Organization The model uses a trait-based organization for steering vectors: ``` vectors/ ├── coding/ # Current: Optimized for code generation ├── safety/ # Future: Safety-aligned behavior ├── creativity/ # Future: Enhanced creative outputs ├── helpfulness/ # Future: Improved helpfulness └── reasoning/ # Future: Enhanced logical reasoning ``` To switch traits, simply update the configuration: ```json { "steering_vector_path": "./vectors/safety/steering_vector.safetensors" } ``` ## Technical Details ### CAA Steering Parameters - **Steering Method**: Contrastive Activation Addition (CAA) - **Optimal Layer**: 24 (out of 28 transformer layers) - **Steering Strength (α)**: 0.9 - **Vector Format**: Safetensors format for efficient loading and HuggingFace compatibility - **Vector Dimension**: 3584 (pre-normalized during training) - **Storage Path**: `./vectors/coding/steering_vector.safetensors` ### How It Works 1. **Trait-based Organization**: Steering vectors are organized by behavioral traits (`vectors/{trait}/`) 2. **Dynamic Loading**: The model loads the specified steering vector from the configured path 3. **Layer Application**: Steering is applied to hidden states at layer 24 during forward pass 4. **Generation Integration**: Steering affects the last token position during generation 5. **Configurable Strength**: The α parameter (default: 0.9) controls steering intensity 6. **Pre-optimized Vectors**: Steering vectors are pre-normalized and ready for immediate use ### Optimization Process The CAA parameters were optimized using: - **Framework**: Optuna with TPE sampler - **Search Space**: Layers 15-28, α ∈ [0.1, 5.0] - **Objective**: Maximize accuracy on MBPP Plus validation set - **Best Validation Score**: 64% accuracy ## Model Architecture ``` WisentQwen2ForCausalLM ├── Base: Qwen2.5-Coder-7B-Instruct ├── CAA Integration: Layer 24 ├── Steering Vector: ./vectors/coding/steering_vector.safetensors └── Auto-applied during generation ``` ## File Structure ``` huggingface_qwen_generated/ ├── config.json # Model configuration with CAA params ├── modeling_wisent_qwen.py # Custom model class ├── tokenizer files # Standard Qwen tokenizer ├── wisent_config.json # Optimization results └── vectors/ # Trait-based steering vectors └── coding/ └── steering_vector.safetensors # Optimized coding steering vector ``` ## Evaluation ### MBPP Plus Benchmark The model should be evaluated on the complete MBPP Plus dataset (378 problems) to measure improvement over the baseline. Expected improvements based on validation results. ### Running Evaluation ```python # Use with bigcode-evaluation-harness from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained( "./huggingface_qwen_generated", trust_remote_code=True ) # CAA steering is automatically applied during evaluation! # No manual hooks or modifications needed ``` ## Citation If you use this model, please cite: ```bibtex @software{wisent_qwen_caa_2025, title={Wisent-Qwen2.5-Coder with CAA Steering}, author={Wisent AI}, year={2025}, url={https://github.com/wisent-ai/wisent-guard} } ``` ## License This model inherits the license from the base Qwen2.5-Coder-7B-Instruct model. Please refer to the original model's license for usage terms. ## Acknowledgments - Base model: Qwen2.5-Coder-7B-Instruct by Alibaba - CAA method: Contrastive Activation Addition - Optimization: Optuna framework - Implementation: Wisent Guard framework