# Combined Task Vector Model This model was created by combining task vectors from multiple fine-tuned models. ## Task Vector Computation ```python t_1 = TaskVector("google/gemma-2b-it", "coastalcph/gemma-2b-it-gcd_sycophancy_2e-04") t_2 = TaskVector("google/gemma-2b-it", "coastalcph/gemma-2b-it-personality-non-sycophancy") t_combined = 1.0 * t_1 + 1.5 * t_2 - 1.5 * t_3 new_model = t_combined.apply_to("google/gemma-2b-it", scaling_coef=1.0) ``` Models Used - Base Model: https://huggingface.co/google/gemma-2b-it - Fine-tuned Model 1: https://huggingface.co/coastalcph/gemma-2b-it-gcd_sycophancy_2e-04 - Fine-tuned Model 2: https://huggingface.co/coastalcph/gemma-2b-it-personality-non-sycophancy Technical Details - Creation Script Git Hash: d0db42d73be516ec04f0ecdc8003189e98b5f722 - Task Vector Method: Additive combination - Args: { "pretrained_model": "google/gemma-2b-it", "finetuned_model1": "coastalcph/gemma-2b-it-gcd_sycophancy_2e-04", "finetuned_model2": "coastalcph/gemma-2b-it-personality-non-sycophancy", "finetuned_model3": "coastalcph/gemma-2b-it-personality-sycophancy", "output_model_name": "coastalcph/gemma-2b-it-1t_gcd_sycophancy_pout_1.5t_diff_sycophant", "output_dir": "/projects/nlp/data/constanzam/weight-interp/task-vectors/math_non_sycophant_12Aug", "scaling_coef": 1.0, "apply_line_scaling_t1": false, "apply_line_scaling_t2": false, "apply_line_scaling_t3": false, "combine_diff_projecting_out": true, "scale_t1": 1.0, "scale_t2": 1.5, "scale_t3": 1.5 }