timm documentation
HParams
HParams
Over the years, many timm models have been trained with various hyper-parameters as the libraries and models evolved. I don’t have a record of every instance, but have recorded instances of many that can serve as a very good starting point.
Tags
Most timm trained models have an identifier in their pretrained tag that relates them (roughly) to a family / version of hparams I’ve used over the years.
| Tag(s) | Description | Optimizer | LR Schedule | Other Notes |
|---|---|---|---|---|
a1h | Based on ResNet Strikes Back A1 recipe | LAMB | Cosine with warmup | Stronger dropout, stochastic depth, and RandAugment than paper A1 recipe |
ah | Based on ResNet Strikes Back A1 recipe | LAMB | Cosine with warmup | No CutMix. Stronger dropout, stochastic depth, and RandAugment than paper A1 recipe |
a1, a2, a3 | ResNet Strikes Back A{1,2,3} recipe | LAMB with BCE loss | Cosine with warmup | — |
b1, b2, b1k, b2k | Based on ResNet Strikes Back B recipe (equivalent to timm RA2 recipes) | RMSProp (TF 1.0 behaviour) | Step (exponential decay w/ staircase) with warmup | — |
c, c1, c2, c3 | Based on ResNet Strikes Back C recipes | SGD (Nesterov) with AGC | Cosine with warmup | — |
ch | Based on ResNet Strikes Back C recipes | SGD (Nesterov) with AGC | Cosine with warmup | Stronger dropout, stochastic depth, and RandAugment than paper C1/C2 recipes |
d, d1, d2 | Based on ResNet Strikes Back D recipe | AdamW with BCE loss | Cosine with warmup | — |
sw | Based on Swin Transformer train/pretrain recipe (basis of DeiT and ConvNeXt recipes) | AdamW with gradient clipping, EMA | Cosine with warmup | — |
ra, ra2, ra3, racm, raa | RandAugment recipes. Inspired by EfficientNet RandAugment recipes. Covered by B recipe in ResNet Strikes Back. | RMSProp (TF 1.0 behaviour), EMA | Step (exponential decay w/ staircase) with warmup | — |
ra4 | RandAugment v4. Inspired by MobileNetV4 hparams. | - | ||
am | AugMix recipe | SGD (Nesterov) with JSD loss | Cosine with warmup | — |
ram | AugMix (with RandAugment) recipe | SGD (Nesterov) with JSD loss | Cosine with warmup | — |
bt | Bag-of-Tricks recipe | SGD (Nesterov) | Cosine with warmup | — |
Config File Gists
I’ve collected several of the hparam families in a series of gists. These can be downloaded and used with the --config hparam.yaml argument with the timm train script. Some adjustment is always required for the LR vs effective global batch size.
| Tag | Key Model Architectures | Gist Link |
|---|---|---|
ra2 | ResNet, EfficientNet, RegNet, NFNet | Link |
ra3 | RegNet | Link |
ra4 | MobileNetV4 | Link |
sw | ViT, ConvNeXt, CoAtNet, MaxViT | Link |
sbb | ViT | Link |
| — | Tiny Test Models | Link |