LLM Model Tracing

This repository investigates model tracing in large language models (LLMs).

Specifically, given a base LLM and a fine-tuned LLM, this code provides functionality to:

Permute the weights of one model (either MLP or embedding weights).
Align the weights of the fine-tuned model to the base model using the Hungarian algorithm.
Evaluate the effect of weight permutation and alignment on different statistics:
- Mode connectivity
- Cosine similarity
- Embedding similarity
Evaluate the perplexity of the base and fine-tuned models on a given dataset.

Requirements

Install the necessary packages using:

pip install -r requirements.txt

For development, install the development dependencies:

pip install -r requirements-dev.txt

Code Formatting with pre-commit

This repository uses pre-commit hooks to ensure code quality and consistency.

Install pre-commit:

pip install pre-commit

Set up the pre-commit hooks:

pre-commit install

(Optional) Run pre-commit on all files:

pre-commit run --all-files

Pre-commit will automatically run on staged files when you commit changes, applying:

Black for code formatting
Ruff for linting and fixing common issues
nbQA for notebook formatting
Various file checks (trailing whitespace, YAML validity, etc.)

Usage

The repository provides three main scripts:

main.py: Executes the main experiment pipeline for model tracing.
launch.py: Launches multiple experiments in parallel using slurm.

`main.py`

This script performs the following steps:

Loads the base and fine-tuned LLMs.
Optionally permutes the weights of the fine-tuned model.
Calculates the selected statistic for the non-aligned models.
Optionally aligns the weights of the fine-tuned model to the base model.
Calculates the selected statistic for the aligned models.
Optionally evaluates the perplexity of the base and fine-tuned models.
Saves the results to a pickle file.

The script accepts various command-line arguments:

--base_model_id: HuggingFace model ID for the base model.
--ft_model_id: HuggingFace model ID for the fine-tuned model.
--permute: Whether to permute the weights of the fine-tuned model.
--align: Whether to align the weights of the fine-tuned model to the base model.
--dataset_id: HuggingFace dataset ID for perplexity evaluation.
--stat: Statistic to calculate (options: "mode", "cos", "emb").
- csu: cosine similarity of weights statistic (on MLP up projection matrices) w/ Spearman correlation
  - csu_all: csu on all pairs of parameters with equal shape
- csh: cosine similarity of MLP activations statistic w/ Spearman correlation
- match: unconstrained statistic (match) with permutation matching of MLP activations
  - match_all: unconstrained statistic (match) on all pairs of MLP block activations
--attn: Whether to consider attention weights in the "mode" statistic.
--emb: Whether to consider embedding weights in the "mode" statistic.
--eval: Whether to evaluate perplexity.
--save: Path to save the results pickle file.

Example usage:

python main.py --base_model_id meta-llama/Llama-2-7b-hf --ft_model_id lmsys/vicuna-7b-v1.5 --stat csu --save results.p

python main.py --base_model_id meta-llama/Llama-2-7b-hf --ft_model_id lmsys/vicuna-7b-v1.5 --permute --align --dataset wikitext --stat match --attn --save results.p

`launch.py`

This script launches multiple experiments in parallel using slurm. It reads model IDs from a YAML file and runs main.py for each pair of base and fine-tuned models. Use the flag --flat all (defaulted) to run on all pairs of models from a YAML (see config/llama7b.yaml); or, --flat split to run on all pairs of a 'base' model with a 'finetuned' model (see config/llama7b_split.yaml); or --flat specified to run on a specified list of pairs of models.

Configuration

The model-tracing/config/model_list.yaml file defines the base and fine-tuned models for the experiments.

Data

The code downloads and uses the Wikitext 103 dataset for perplexity evaluation.

Results

The results of the experiments are saved as pickle files. The files contain dictionaries with the following keys:

args: Command-line arguments used for the experiment.
commit: Git commit hash of the code used for the experiment.
non-aligned test stat: Value of the selected statistic for the non-aligned models.
aligned test stat: Value of the selected statistic for the aligned models (if --align is True).
base loss: Perplexity of the base model on the evaluation dataset (if --eval is True).
ft loss: Perplexity of the fine-tuned model on the evaluation dataset (if --eval is True).
time: Total execution time of the experiment.

Sample commands

70B runs

 python main.py --base_model_id meta-llama/Llama-2-70b-hf --ft_model_id meta-llama/Meta-Llama-3-70B --stat csu

Experiments

Relevant scripts for running additional experiments described in our paper are in this folder. For example, there are experiments on retraining MLP blocks and evaluating our statistics.

These include experiments/localized_testing.py (Section 3.2.1) for fine-grained forensics and layer-matching between two models; experiments/csu_full.py (Section 3.2.1) for full parameter-matching between any two model architectures for hybrid models; experiments/generalized_match.py (Section 2.3.2, 3.2.3, 3.2.4) for the generalized robust test that involes retraining or distilling GLU MLPs; and experiments/huref.py (Appendix F) where we reproduce and break the invariants from a related work (Zeng et al. 2024).