Affine Transform: EleutherAI/deep-ignorance-pretraining-stage-unfiltered (global_step38144) β†’ EleutherAI/deep-ignorance-unfiltered

Learned affine transformation mapping hidden state activations from a source checkpoint to a target model.

Usage

from safetensors.torch import load_file
import torch.nn as nn
from huggingface_hub import hf_hub_download

# Download files
weights_path = hf_hub_download(
    repo_id="EleutherAI/affine-checkpoint-transfer",
    filename="affine_transforms.safetensors",
)
metadata_path = hf_hub_download(repo_id="EleutherAI/affine-checkpoint-transfer", filename="metadata.json")

# Load
import json
with open(metadata_path) as f:
    metadata = json.load(f)

weights = load_file(weights_path)
affine_transforms = {}
for layer_idx in metadata["layer_indices"]:
    linear = nn.Linear(metadata["hidden_dim"], metadata["hidden_dim"], bias=True)
    linear.weight.data = weights[f"layer_{layer_idx}.weight"]
    linear.bias.data = weights[f"layer_{layer_idx}.bias"]
    affine_transforms[layer_idx] = linear

MSE Metrics

Layer MSE
0 0.000050
1 0.014719
2 0.020466
3 0.034601
4 0.056137
5 0.081037
6 0.111370
7 0.169480
8 0.200614
9 0.243166
10 0.289330
11 0.346412
12 0.421382
13 0.506538
14 0.591292
15 0.684350
16 0.774846
17 0.841685
18 0.894353
19 0.938238
20 0.978894
21 1.036667
22 1.101122
23 1.213081
24 1.347362
25 1.569308
26 1.895770
27 2.358280
28 2.917744
29 3.607210
30 4.231404
31 4.862252

Mean MSE: 1.073099

Training Details

  • Source Model: EleutherAI/deep-ignorance-pretraining-stage-unfiltered (global_step38144)
  • Target Model: EleutherAI/deep-ignorance-unfiltered
  • Hidden Dimension: 4096
  • Ridge Alpha: 0.01
  • Layers: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]
  • Training Examples: 100000
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support