TRM Model - Pretrained on ARC AGI II

This repo contains model checkpoints.

See Wandb logs here.

We ran on 4xH200 SXM for ~48 hours.

Final score ~ 10.5%.

Config:

# ARC training config

defaults:
  - arch: trm
  - _self_

hydra:
  output_subdir: null

entity: "trelis"

# Data path
data_paths: ['data/arc2concept-aug-1000']
data_paths_test: []

evaluators:
  - name: arc@ARC

# Hyperparams - Training
global_batch_size: 768

epochs: 100000
eval_interval: 10000
checkpoint_every_eval: True
checkpoint_every_n_steps: null

lr: 1e-4
lr_min_ratio: 1.0
lr_warmup_steps: 2000

# Standard hyperparameter settings for LM, as used in Llama
beta1: 0.9
beta2: 0.95
weight_decay: 0.1
puzzle_emb_weight_decay: 0.1

# Hyperparams - Puzzle embeddings training
puzzle_emb_lr: 1e-2

seed: 0
min_eval_interval: 0 # when to start the eval

ema: True # use Exponential-Moving-Average
ema_rate: 0.999 # EMA-rate
freeze_weights: False # If True, freeze weights and only learn the embeddings

trm.yaml

name: recursive_reasoning.trm@TinyRecursiveReasoningModel_ACTV1
loss:
  name: losses@ACTLossHead
  loss_type: stablemax_cross_entropy

halt_exploration_prob: 0.1
halt_max_steps: 16

H_cycles: 3
L_cycles: 4 # NOTE THAT THIS IS DIFFERENT THAN THE PAPER, THAT USES 6. THE DIFFERENCE WAS ACCIDENTAL.

H_layers: 0
L_layers: 2

hidden_size: 512
num_heads: 8  # min(2, hidden_size // 64)
expansion: 4

puzzle_emb_ndim: ${.hidden_size}

pos_encodings: rope
forward_dtype: bfloat16

mlp_t: False # use mlp on L instead of transformer
puzzle_emb_len: 16 # if non-zero, its specified to this value
no_ACT_continue: True # No continue ACT loss, only use the sigmoid of the halt which makes much more sense
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Trelis/TRM-ARC-AGI-II