# LLM Model Tracing
This repository investigates model tracing in large language models (LLMs).

Specifically, given a base LLM and a fine-tuned LLM, this code provides functionality to:

- Permute the weights of one model (either MLP or embedding weights).
- Align the weights of the fine-tuned model to the base model using the Hungarian algorithm.
- Evaluate the effect of weight permutation and alignment on different statistics:
    - Mode connectivity
    - Cosine similarity
    - Embedding similarity
- Evaluate the perplexity of the base and fine-tuned models on a given dataset.

## Requirements

Install the necessary packages using:

```bash
pip install -r requirements.txt
```

For development, install the development dependencies:

```bash
pip install -r requirements-dev.txt
```

### Code Formatting with pre-commit

This repository uses pre-commit hooks to ensure code quality and consistency.

1. Install pre-commit:

```bash
pip install pre-commit
```

2. Set up the pre-commit hooks:

```bash
pre-commit install
```

3. (Optional) Run pre-commit on all files:

```bash
pre-commit run --all-files
```

Pre-commit will automatically run on staged files when you commit changes, applying:
- Black for code formatting
- Ruff for linting and fixing common issues
- nbQA for notebook formatting
- Various file checks (trailing whitespace, YAML validity, etc.)

## Usage

The repository provides three main scripts:

- `main.py`: Executes the main experiment pipeline for model tracing.
- `launch.py`: Launches multiple experiments in parallel using slurm.

### `main.py`

This script performs the following steps:

1. Loads the base and fine-tuned LLMs.
2. Optionally permutes the weights of the fine-tuned model.
3. Calculates the selected statistic for the non-aligned models.
4. Optionally aligns the weights of the fine-tuned model to the base model.
5. Calculates the selected statistic for the aligned models.
6. Optionally evaluates the perplexity of the base and fine-tuned models.
7. Saves the results to a pickle file.

The script accepts various command-line arguments:

- `--base_model_id`: HuggingFace model ID for the base model.
- `--ft_model_id`: HuggingFace model ID for the fine-tuned model.
- `--permute`: Whether to permute the weights of the fine-tuned model.
- `--align`: Whether to align the weights of the fine-tuned model to the base model.
- `--dataset_id`: HuggingFace dataset ID for perplexity evaluation.
- `--stat`: Statistic to calculate (options: "mode", "cos", "emb").
  - csu: cosine similarity of weights statistic (on MLP up projection matrices) w/ Spearman correlation
    - csu_all: csu on all pairs of parameters with equal shape
  -  csh: cosine similarity of MLP activations statistic w/ Spearman correlation
  -  match: unconstrained statistic (match) with permutation matching of MLP activations
     - match_all: unconstrained statistic (match) on all pairs of MLP block activations
- `--attn`: Whether to consider attention weights in the "mode" statistic.
- `--emb`: Whether to consider embedding weights in the "mode" statistic.
- `--eval`: Whether to evaluate perplexity.
- `--save`: Path to save the results pickle file.

Example usage:

```bash
python main.py --base_model_id meta-llama/Llama-2-7b-hf --ft_model_id lmsys/vicuna-7b-v1.5 --stat csu --save results.p
```

```bash
python main.py --base_model_id meta-llama/Llama-2-7b-hf --ft_model_id lmsys/vicuna-7b-v1.5 --permute --align --dataset wikitext --stat match --attn --save results.p
```

### `launch.py`

This script launches multiple experiments in parallel using slurm. It reads model IDs from a YAML file and runs `main.py` for each pair of base and fine-tuned models. Use the flag --flat all (defaulted) to run on all pairs of models from a YAML (see config/llama7b.yaml); or, --flat split to run on all pairs of a 'base' model with a 'finetuned' model (see config/llama7b_split.yaml); or --flat specified to run on a specified list of pairs of models.

## Configuration

The `model-tracing/config/model_list.yaml` file defines the base and fine-tuned models for the experiments.
## Data

The code downloads and uses the Wikitext 103 dataset for perplexity evaluation.

## Results

The results of the experiments are saved as pickle files. The files contain dictionaries with the following keys:

- `args`: Command-line arguments used for the experiment.
- `commit`: Git commit hash of the code used for the experiment.
- `non-aligned test stat`: Value of the selected statistic for the non-aligned models.
- `aligned test stat`: Value of the selected statistic for the aligned models (if `--align` is True).
- `base loss`: Perplexity of the base model on the evaluation dataset (if `--eval` is True).
- `ft loss`: Perplexity of the fine-tuned model on the evaluation dataset (if `--eval` is True).
- `time`: Total execution time of the experiment.

## Sample commands

### 70B runs
```
 python main.py --base_model_id meta-llama/Llama-2-70b-hf --ft_model_id meta-llama/Meta-Llama-3-70B --stat csu
```

# Experiments

Relevant scripts for running additional experiments described in our paper are in this folder. For example, there are experiments on retraining MLP blocks and evaluating our statistics.

These include `experiments/localized_testing.py` (Section 3.2.1) for fine-grained forensics and layer-matching between two models; `experiments/csu_full.py` (Section 3.2.1) for full parameter-matching between any two model architectures for hybrid models; `experiments/generalized_match.py` (Section 2.3.2, 3.2.3, 3.2.4) for the generalized robust test that involes retraining or distilling GLU MLPs; and `experiments/huref.py` (Appendix F) where we reproduce and break the invariants from a related work (Zeng et al. 2024).