suous nielsr HF Staff commited on
Commit
2dc1957
Β·
verified Β·
1 Parent(s): 4c401bf

Improve model card with detailed information from paper and GitHub README (#1)

Browse files

- Improve model card with detailed information from paper and GitHub README (09773cf3fb17190aa911ac8d19db62b241320a91)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +533 -0
README.md CHANGED
@@ -18,6 +18,9 @@ tags:
18
 
19
  # Model Card for RecNeXt-A1
20
 
 
 
 
21
  [![license](https://img.shields.io/github/license/suous/RecNeXt)](https://github.com/suous/RecNeXt/blob/main/LICENSE)
22
  [![arXiv](https://img.shields.io/badge/arXiv-2406.16004-red)](https://arxiv.org/abs/2412.19628)
23
 
@@ -47,6 +50,17 @@ tags:
47
 
48
  - **Dataset**: ImageNet-1K
49
 
 
 
 
 
 
 
 
 
 
 
 
50
  ## Model Usage
51
 
52
  ### Image Classification
@@ -81,6 +95,7 @@ import utils
81
  # Convert training-time model to inference structure, fuse batchnorms
82
  utils.replace_batchnorm(model)
83
  ```
 
84
  ## Model Comparison
85
 
86
  ### Classification
@@ -117,8 +132,526 @@ We introduce two series of models: the **A** series uses linear attention and ne
117
  > And the throughput is tested on an Nvidia RTX3090 with maximum power-of-two batch size that fits in memory.
118
 
119
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
120
  ## Citation
121
 
 
 
122
  ```BibTeX
123
  @misc{zhao2024recnext,
124
  title={RecConv: Efficient Recursive Convolutions for Multi-Frequency Representations},
 
18
 
19
  # Model Card for RecNeXt-A1
20
 
21
+ ## Abstract
22
+ Recent advances in vision transformers (ViTs) have demonstrated the advantage of global modeling capabilities, prompting widespread integration of large-kernel convolutions for enlarging the effective receptive field (ERF). However, the quadratic scaling of parameter count and computational complexity (FLOPs) with respect to kernel size poses significant efficiency and optimization challenges. This paper introduces RecConv, a recursive decomposition strategy that efficiently constructs multi-frequency representations using small-kernel convolutions. RecConv establishes a linear relationship between parameter growth and decomposing levels which determines the effective receptive field $k\times 2^\ell$ for a base kernel $k$ and $\ell$ levels of decomposition, while maintaining constant FLOPs regardless of the ERF expansion. Specifically, RecConv achieves a parameter expansion of only $\ell+2$ times and a maximum FLOPs increase of $5/3$ times, compared to the exponential growth ($4^\ell$) of standard and depthwise convolutions. RecNeXt-M3 outperforms RepViT-M1.1 by 1.9 $AP^{box}$ on COCO with similar FLOPs. This innovation provides a promising avenue towards designing efficient and compact networks across various modalities. Codes and models can be found at this https URL .
23
+
24
  [![license](https://img.shields.io/github/license/suous/RecNeXt)](https://github.com/suous/RecNeXt/blob/main/LICENSE)
25
  [![arXiv](https://img.shields.io/badge/arXiv-2406.16004-red)](https://arxiv.org/abs/2412.19628)
26
 
 
50
 
51
  - **Dataset**: ImageNet-1K
52
 
53
+ ## Recent Updates
54
+
55
+ **UPDATES** πŸ”₯
56
+ - **2025/07/23**: Added a simple architecture, the overall design follows [LSNet](https://github.com/jameslahm/lsnet).
57
+ - **2025/07/04**: Uploaded classification models to [HuggingFace](https://huggingface.co/suous)πŸ€—.
58
+ - **2025/07/01**: Added more comparisons with [LSNet](https://github.com/jameslahm/lsnet).
59
+ - **2025/06/27**: Added **A** series code and logs, replacing convolution with linear attention.
60
+ - **2025/03/19**: Added more ablation study results, including using attention with RecConv design.
61
+ - **2025/01/02**: Uploaded checkpoints and training logs of RecNeXt-M0.
62
+ - **2024/12/29**: Uploaded checkpoints and training logs of RecNeXt-M1 - M5.
63
+
64
  ## Model Usage
65
 
66
  ### Image Classification
 
95
  # Convert training-time model to inference structure, fuse batchnorms
96
  utils.replace_batchnorm(model)
97
  ```
98
+
99
  ## Model Comparison
100
 
101
  ### Classification
 
132
  > And the throughput is tested on an Nvidia RTX3090 with maximum power-of-two batch size that fits in memory.
133
 
134
 
135
+ ## Latency Measurement
136
+
137
+ The latency reported in RecNeXt for iPhone 13 (iOS 18) uses the benchmark tool from [XCode 14](https://developer.apple.com/videos/play/wwdc2022/10027/).
138
+
139
+ <details>
140
+ <summary>
141
+ RecNeXt-M0
142
+ </summary>
143
+ <img src="https://raw.githubusercontent.com/suous/RecNeXt/main/figures/latency/recnext_m0_224x224.png" alt="recnext_m0">
144
+ </details>
145
+
146
+ <details>
147
+ <summary>
148
+ RecNeXt-M1
149
+ </summary>
150
+ <img src="https://raw.githubusercontent.com/suous/RecNeXt/main/figures/latency/recnext_m1_224x224.png" alt="recnext_m1">
151
+ </details>
152
+
153
+ <details>
154
+ <summary>
155
+ RecNeXt-M2
156
+ </summary>
157
+ <img src="https://raw.githubusercontent.com/suous/RecNeXt/main/figures/latency/recnext_m2_224x224.png" alt="recnext_m2">
158
+ </details>
159
+
160
+ <details>
161
+ <summary>
162
+ RecNeXt-M3
163
+ </summary>
164
+ <img src="https://raw.githubusercontent.com/suous/RecNeXt/main/figures/latency/recnext_m3_224x224.png" alt="recnext_m3">
165
+ </details>
166
+
167
+ <details>
168
+ <summary>
169
+ RecNeXt-M4
170
+ </summary>
171
+ <img src="https://raw.githubusercontent.com/suous/RecNeXt/main/figures/latency/recnext_m4_224x224.png" alt="recnext_m4">
172
+ </details>
173
+
174
+ <details>
175
+ <summary>
176
+ RecNeXt-M5
177
+ </summary>
178
+ <img src="https://raw.githubusercontent.com/suous/RecNeXt/main/figures/latency/recnext_m5_224x224.png" alt="recnext_m5">
179
+ </details>
180
+
181
+ <details>
182
+ <summary>
183
+ RecNeXt-A0
184
+ </summary>
185
+ <img src="https://raw.githubusercontent.com/suous/RecNeXt/main/figures/latency/recnext_a0_224x224.png" alt="recnext_a0">
186
+ </details>
187
+
188
+ <details>
189
+ <summary>
190
+ RecNeXt-A1
191
+ </summary>
192
+ <img src="https://raw.githubusercontent.com/suous/RecNeXt/main/figures/latency/recnext_a1_224x224.png" alt="recnext_a1">
193
+ </details>
194
+
195
+ <details>
196
+ <summary>
197
+ RecNeXt-A2
198
+ </summary>
199
+ <img src="https://raw.githubusercontent.com/suous/RecNeXt/main/figures/latency/recnext_a2_224x224.png" alt="recnext_a2">
200
+ </details>
201
+
202
+ <details>
203
+ <summary>
204
+ RecNeXt-A3
205
+ </summary>
206
+ <img src="https://raw.githubusercontent.com/suous/RecNeXt/main/figures/latency/recnext_a3_224x224.png" alt="recnext_a3">
207
+ </details>
208
+
209
+ <details>
210
+ <summary>
211
+ RecNeXt-A4
212
+ </summary>
213
+ <img src="https://raw.githubusercontent.com/suous/RecNeXt/main/figures/latency/recnext_a4_224x224.png" alt="recnext_a4">
214
+ </details>
215
+
216
+ <details>
217
+ <summary>
218
+ RecNeXt-A5
219
+ </summary>
220
+ <img src="https://raw.githubusercontent.com/suous/RecNeXt/main/figures/latency/recnext_a5_224x224.png" alt="recnext_a5">
221
+ </details>
222
+
223
+ Tips: export the model to Core ML model
224
+ ```
225
+ python export_coreml.py --model recnext_m1 --ckpt pretrain/recnext_m1_distill_300e.pth
226
+ ```
227
+ Tips: measure the throughput on GPU
228
+ ```
229
+ python speed_gpu.py --model recnext_m1
230
+ ```
231
+
232
+ ## ImageNet (Training and Evaluation)
233
+
234
+ ### Prerequisites
235
+ `conda` virtual environment is recommended.
236
+ ```
237
+ conda create -n recnext python=3.8
238
+ pip install -r requirements.txt
239
+ ```
240
+
241
+ ### Data preparation
242
+
243
+ Download and extract ImageNet train and val images from http://image-net.org/. The training and validation data are expected to be in the `train` folder and `val` folder respectively:
244
+
245
+ ```bash
246
+ # script to extract ImageNet dataset: https://github.com/pytorch/examples/blob/main/imagenet/extract_ILSVRC.sh
247
+ # ILSVRC2012_img_train.tar (about 138 GB)
248
+ # ILSVRC2012_img_val.tar (about 6.3 GB)
249
+ ```
250
+
251
+ ```
252
+ # organize the ImageNet dataset as follows:
253
+ imagenet
254
+ β”œβ”€β”€ train
255
+ β”‚ β”œβ”€β”€ n01440764
256
+ β”‚ β”‚ β”œβ”€β”€ n01440764_10026.JPEG
257
+ β”‚ β”‚ β”œβ”€β”€ n01440764_10027.JPEG
258
+ β”‚ β”‚ β”œβ”€β”€ ......
259
+ β”‚ β”œβ”€β”€ ......
260
+ β”œβ”€β”€ val
261
+ β”‚ β”œβ”€β”€ n01440764
262
+ β”‚ β”‚ β”œβ”€β”€ ILSVRC2012_val_00000293.JPEG
263
+ β”‚ β”‚ β”œβ”€β”€ ILSVRC2012_val_00002138.JPEG
264
+ β”‚ β”‚ β”œβ”€β”€ ......
265
+ β”‚ β”œβ”€β”€ ......
266
+ ```
267
+
268
+ ### Training
269
+ To train RecNeXt-M1 on an 8-GPU machine:
270
+
271
+ ```
272
+ python -m torch.distributed.launch --nproc_per_node=8 --master_port 12346 --use_env main.py --model recnext_m1 --data-path ~/imagenet --dist-eval
273
+ ```
274
+ Tips: specify your data path and model name!
275
+
276
+ ### Testing
277
+ For example, to test RecNeXt-M1:
278
+ ```
279
+ python main.py --eval --model recnext_m1 --resume pretrain/recnext_m1_distill_300e.pth --data-path ~/imagenet
280
+ ```
281
+
282
+ Use pretrained model without knowledge distillation from [HuggingFace](https://huggingface.co/suous) πŸ€—.
283
+ ```bash
284
+ python main.py --eval --model recnext_m1 --data-path ~/imagenet --pretrained --distillation-type none
285
+ ```
286
+
287
+ Use pretrained model with knowledge distillation from [HuggingFace](https://huggingface.co/suous) πŸ€—.
288
+ ```bash
289
+ python main.py --eval --model recnext_m1 --data-path ~/imagenet --pretrained --distillation-type hard
290
+ ```
291
+
292
+ ### Fused model evaluation
293
+ For example, to evaluate RecNeXt-M1 with the fused model: [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/suous/RecNeXt/blob/main/demo/fused_model_evaluation.ipynb)
294
+ ```
295
+ python fuse_eval.py --model recnext_m1 --resume pretrain/recnext_m1_distill_300e_fused.pt --data-path ~/imagenet
296
+ ```
297
+
298
+ ### Extract model for publishing
299
+
300
+ ```
301
+ # without distillation
302
+ python publish.py --model_name recnext_m1 --checkpoint_path pretrain/checkpoint_best.pth --epochs 300
303
+
304
+ # with distillation
305
+ python publish.py --model_name recnext_m1 --checkpoint_path pretrain/checkpoint_best.pth --epochs 300 --distillation
306
+
307
+ # fused model
308
+ python publish.py --model_name recnext_m1 --checkpoint_path pretrain/checkpoint_best.pth --epochs 300 --fused
309
+ ```
310
+
311
+ ## Downstream Tasks
312
+ [Object Detection and Instance Segmentation](detection/README.md)<br>
313
+
314
+ | Model | $AP^b$ | $AP_{50}^b$ | $AP_{75}^b$ | $AP^m$ | $AP_{50}^m$ | $AP_{75}^m$ | Latency | Ckpt | Log |
315
+ |:-----------|:------:|:-----------:|:-----------:|:------:|:-----------:|:-----------:|:-------:|:---------------------------------------------------------------------------------:|:-------------------------------------------:|
316
+ | RecNeXt-M3 | 41.7 | 63.4 | 45.4 | 38.6 | 60.5 | 41.4 | 5.2ms | [M3](https://github.com/suous/RecNeXt/releases/download/v1.0/recnext_m3_coco.pth) | [M3](./detection/logs/recnext_m3_coco.json) |
317
+ | RecNeXt-M4 | 43.5 | 64.9 | 47.7 | 39.7 | 62.1 | 42.4 | 7.6ms | [M4](https://github.com/suous/RecNeXt/releases/download/v1.0/recnext_m4_coco.pth) | [M4](./detection/logs/recnext_m4_coco.json) |
318
+ | RecNeXt-M5 | 44.6 | 66.3 | 49.0 | 40.6 | 63.5 | 43.5 | 12.4ms | [M5](https://github.com/suous/RecNeXt/releases/download/v1.0/recnext_m5_coco.pth) | [M5](./detection/logs/recnext_m5_coco.json) |
319
+ | RecNeXt-A3 | 42.1 | 64.1 | 46.2 | 38.8 | 61.1 | 41.6 | 8.3ms | [A3](https://github.com/suous/RecNeXt/releases/download/v2.0/recnext_a3_coco.pth) | [A3](./detection/logs/recnext_a3_coco.json) |
320
+ | RecNeXt-A4 | 43.5 | 65.4 | 47.6 | 39.8 | 62.4 | 42.9 | 14.0ms | [A4](https://github.com/suous/RecNeXt/releases/download/v2.0/recnext_a4_coco.pth) | [A4](./detection/logs/recnext_a4_coco.json) |
321
+ | A5 | 44.4 | 66.3 | 48.9 | 40.3 | 63.3 | 43.4 | 25.3ms | [A5](https://github.com/suous/RecNeXt/releases/download/v2.0/recnext_a5_coco.pth) | [A5](./detection/logs/recnext_a5_coco.json) |
322
+ ```bash
323
+ # this script is used to validate the detection results
324
+ fd json detection/logs -x sh -c 'printf "%.1f %s
325
+ " "$(tail -n +2 {} | jq -s "map(.bbox_mAP) | max * 100")" "{}"' | sort -k2
326
+ ```
327
+
328
+ <details>
329
+ <summary>
330
+ <span>output</span>
331
+ </summary>
332
+
333
+ ```
334
+ 42.1 detection/logs/recnext_a3_coco.json
335
+ 43.5 detection/logs/recnext_a4_coco.json
336
+ 44.4 detection/logs/recnext_a5_coco.json
337
+ 41.7 detection/logs/recnext_m3_coco.json
338
+ 43.5 detection/logs/recnext_m4_coco.json
339
+ 44.6 detection/logs/recnext_m5_coco.json
340
+ ```
341
+ </details>
342
+
343
+ [Semantic Segmentation](segmentation/README.md)
344
+
345
+ | Model | mIoU | Latency | Ckpt | Log |
346
+ |:-----------|:----:|:-------:|:-----------------------------------------------------------------------------------:|:------------------------------------------------:|
347
+ | RecNeXt-M3 | 41.0 | 5.6ms | [M3](https://github.com/suous/RecNeXt/releases/download/v1.0/recnext_m3_ade20k.pth) | [M3](./segmentation/logs/recnext_m3_ade20k.json) |
348
+ | RecNeXt-M4 | 43.6 | 7.2ms | [M4](https://github.com/suous/RecNeXt/releases/download/v1.0/recnext_m4_ade20k.pth) | [M4](./segmentation/logs/recnext_m4_ade20k.json) |
349
+ | RecNeXt-M5 | 46.0 | 12.4ms | [M5](https://github.com/suous/RecNeXt/releases/download/v1.0/recnext_m5_ade20k.pth) | [M5](./segmentation/logs/recnext_m5_ade20k.json) |
350
+ | RecNeXt-A3 | 41.9 | 8.4ms | [A3](https://github.com/suous/RecNeXt/releases/download/v2.0/recnext_a3_ade20k.pth) | [A3](./segmentation/logs/recnext_a3_ade20k.json) |
351
+ | RecNeXt-A4 | 43.0 | 14.0ms | [A4](https://github.com/suous/RecNeXt/releases/download/v2.0/recnext_a4_ade20k.pth) | [A4](./segmentation/logs/recnext_a4_ade20k.json) |
352
+ | A5 | 46.5 | 25.3ms | [A5](https://github.com/suous/RecNeXt/releases/download/v2.0/recnext_a5_ade20k.pth) | [A5](./segmentation/logs/recnext_a5_ade20k.json) |
353
+ ```bash
354
+ # this script is used to validate the segmentation results
355
+ fd json segmentation/logs -x sh -c 'printf "%.1f %s
356
+ " "$(tail -n +2 {} | jq -s "map(.mIoU) | max * 100")" "{}"' | sort -k2
357
+ ```
358
+
359
+ <details>
360
+ <summary>
361
+ <span>output</span>
362
+ </summary>
363
+
364
+ ```
365
+ 41.9 segmentation/logs/recnext_a3_ade20k.json
366
+ 43.0 segmentation/logs/recnext_a4_ade20k.json
367
+ 46.5 segmentation/logs/recnext_a5_ade20k.json
368
+ 41.0 segmentation/logs/recnext_m3_ade20k.json
369
+ 43.6 segmentation/logs/recnext_m4_ade20k.json
370
+ 46.0 segmentation/logs/recnext_m5_ade20k.json
371
+ ```
372
+ </details>
373
+
374
+ ## Ablation Study
375
+
376
+ ### Overall Experiments
377
+
378
+ ![ablation](https://raw.githubusercontent.com/suous/RecNeXt/main/figures/ablation.png)
379
+
380
+ <details>
381
+ <summary>
382
+ <span style="font-size: larger; ">Ablation Logs</span>
383
+ </summary>
384
+
385
+ <pre>
386
+ logs/ablation
387
+ β”œβ”€β”€ 224
388
+ β”‚ β”œβ”€β”€ <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/logs/ablation/224/recnext_m1_120e_224x224_3x3_7464.txt">recnext_m1_120e_224x224_3x3_7464.txt</a>
389
+ β”‚ β”œβ”€β”€ <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/logs/ablation/224/recnext_m1_120e_224x224_7x7_7552.txt">recnext_m1_120e_224x224_7x7_7552.txt</a>
390
+ β”‚ β”œβ”€β”€ <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/logs/ablation/224/recnext_m1_120e_224x224_bxb_7541.txt">recnext_m1_120e_224x224_bxb_7541.txt</a>
391
+ β”‚ β”œβ”€β”€ <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/logs/ablation/224/recnext_m1_120e_224x224_rec_3x3_7548.txt">recnext_m1_120e_224x224_rec_3x3_7548.txt</a>
392
+ β”‚ β”œβ”€β”€ <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/logs/ablation/224/recnext_m1_120e_224x224_rec_5x5_7603.txt">recnext_m1_120e_224x224_rec_5x5_7603.txt</a>
393
+ β”‚ β”œβ”€β”€ <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/logs/ablation/224/recnext_m1_120e_224x224_rec_7x7_7567.txt">recnext_m1_120e_224x224_rec_7x7_7567.txt</a>
394
+ β”‚ β”œβ”€β”€ <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/logs/ablation/224/recnext_m1_120e_224x224_rec_7x7_nearest_7571.txt">recnext_m1_120e_224x224_rec_7x7_nearest_7571.txt</a>
395
+ β”‚ β”œβ”€β”€ <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/logs/ablation/224/recnext_m1_120e_224x224_rec_7x7_nearest_ssm_7593.txt">recnext_m1_120e_224x224_rec_7x7_nearest_ssm_7593.txt</a>
396
+ β”‚ └── <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/logs/ablation/224/recnext_m1_120e_224x224_rec_7x7_unpool_7548.txt">recnext_m1_120e_224x224_rec_7x7_unpool_7548.txt</a>
397
+ └── 384
398
+ β”œβ”€β”€ <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/logs/ablation/384/recnext_m1_120e_384x384_3x3_7635.txt">recnext_m1_120e_384x384_3x3_7635.txt</a>
399
+ β”œβ”€β”€ <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/logs/ablation/384/recnext_m1_120e_384x384_7x7_7742.txt">recnext_m1_120e_384x384_7x7_7742.txt</a>
400
+ β”œβ”€β”€ <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/logs/ablation/384/recnext_m1_120e_384x384_bxb_7800.txt">recnext_m1_120e_384x384_bxb_7800.txt</a>
401
+ β”œβ”€β”€ <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/logs/ablation/384/recnext_m1_120e_384x384_rec_3x3_7772.txt">recnext_m1_120e_384x384_rec_3x3_7772.txt</a>
402
+ β”œβ”€β”€ <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/logs/ablation/384/recnext_m1_120e_384x384_rec_5x5_7811.txt">recnext_m1_120e_384x384_rec_5x5_7811.txt</a>
403
+ β”œβ”€β”€ <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/logs/ablation/384/recnext_m1_120e_384x384_rec_7x7_7803.txt">recnext_m1_120e_384x384_rec_7x7_7803.txt</a>
404
+ β”œβ”€β”€ <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/logs/ablation/384/recnext_m1_120e_384x384_rec_convtrans_3x3_basic_7726.txt">recnext_m1_120e_384x384_rec_convtrans_3x3_basic_7726.txt</a>
405
+ β”œβ”€β”€ <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/logs/ablation/384/recnext_m1_120e_384x384_rec_convtrans_5x5_basic_7787.txt">recnext_m1_120e_384x384_rec_convtrans_5x5_basic_7787.txt</a>
406
+ β”œβ”€β”€ <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/logs/ablation/384/recnext_m1_120e_384x384_rec_convtrans_7x7_basic_7824.txt">recnext_m1_120e_384x384_rec_convtrans_7x7_basic_7824.txt</a>
407
+ β”œβ”€β”€ <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/logs/ablation/384/recnext_m1_120e_384x384_rec_convtrans_7x7_group_7791.txt">recnext_m1_120e_384x384_rec_convtrans_7x7_group_7791.txt</a>
408
+ └── <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/logs/ablation/384/recnext_m1_120e_384x384_rec_convtrans_7x7_split_7683.txt">recnext_m1_120e_384x384_rec_convtrans_7x7_split_7683.txt</a>
409
+ </pre>
410
+
411
+ ```bash
412
+ # this script is used to validate the ablation results
413
+ fd txt logs/ablation -x sh -c 'printf "%.2f %s
414
+ " "$(jq -s "map(.test_acc1) | max" {})" "{}"' | sort -k2
415
+ ```
416
+
417
+ <details>
418
+ <summary>
419
+ <span>output</span>
420
+ </summary>
421
+
422
+ ```
423
+ 74.64 logs/ablation/224/recnext_m1_120e_224x224_3x3_7464.txt
424
+ 75.52 logs/ablation/224/recnext_m1_120e_224x224_7x7_7552.txt
425
+ 75.41 logs/ablation/224/recnext_m1_120e_224x224_bxb_7541.txt
426
+ 75.48 logs/ablation/224/recnext_m1_120e_224x224_rec_3x3_7548.txt
427
+ 76.03 logs/ablation/224/recnext_m1_120e_224x224_rec_5x5_7603.txt
428
+ 75.67 logs/ablation/224/recnext_m1_120e_224x224_rec_7x7_7567.txt
429
+ 75.71 logs/ablation/224/recnext_m1_120e_224x224_rec_7x7_nearest_7571.txt
430
+ 75.93 logs/ablation/224/recnext_m1_120e_224x224_rec_7x7_nearest_ssm_7593.txt
431
+ 75.48 logs/ablation/224/recnext_m1_120e_224x224_rec_7x7_unpool_7548.txt
432
+ 76.35 logs/ablation/384/recnext_m1_120e_384x384_3x3_7635.txt
433
+ 77.42 logs/ablation/384/recnext_m1_120e_384x384_7x7_7742.txt
434
+ 78.00 logs/ablation/384/recnext_m1_120e_384x384_bxb_7800.txt
435
+ 77.72 logs/ablation/384/recnext_m1_120e_384x384_rec_3x3_7772.txt
436
+ 78.11 logs/ablation/384/recnext_m1_120e_384x384_rec_5x5_7811.txt
437
+ 78.03 logs/ablation/384/recnext_m1_120e_384x384_rec_7x7_7803.txt
438
+ 77.26 logs/ablation/384/recnext_m1_120e_384x384_rec_convtrans_3x3_basic_7726.txt
439
+ 77.87 logs/ablation/384/recnext_m1_120e_384x384_rec_convtrans_5x5_basic_7787.txt
440
+ 78.24 logs/ablation/384/recnext_m1_120e_384x384_rec_convtrans_7x7_basic_7824.txt
441
+ 77.91 logs/ablation/384/recnext_m1_120e_384x384_rec_convtrans_7x7_group_7791.txt
442
+ 76.84 logs/ablation/384/recnext_m1_120e_384x384_rec_convtrans_7x7_split_7683.txt
443
+ ```
444
+ </details>
445
+
446
+ <details>
447
+ <summary>
448
+ <span style="font-size: larger; ">RecConv Recurrent Aggregation</span>
449
+ </summary>
450
+
451
+ ```python
452
+ class RecConv2d(nn.Module):
453
+ def __init__(self, in_channels, kernel_size=5, bias=False, level=1, mode='nearest'):
454
+ super().__init__()
455
+ self.level = level
456
+ self.mode = mode
457
+ kwargs = {
458
+ 'in_channels': in_channels,
459
+ 'out_channels': in_channels,
460
+ 'groups': in_channels,
461
+ 'kernel_size': kernel_size,
462
+ 'padding': kernel_size // 2,
463
+ 'bias': bias
464
+ }
465
+ self.n = nn.Conv2d(stride=2, **kwargs)
466
+ self.a = nn.Conv2d(**kwargs)
467
+ self.b = nn.Conv2d(**kwargs) if level >1 else None
468
+ self.c = nn.Conv2d(**kwargs)
469
+ self.d = nn.Conv2d(**kwargs)
470
+
471
+ def forward(self, x):
472
+ # 1. Generate Multi-scale Features.
473
+ fs = [x]
474
+ for _ in range(self.level):
475
+ fs.append(self.n(fs[-1]))
476
+
477
+ # 2. Multi-scale Recurrent Aggregation.
478
+ h = None
479
+ for i, o in reversed(list(zip(fs[1:], fs[:-1]))):
480
+ h = self.a(h) + self.b(i) if h is not None else self.b(i)
481
+ h = nn.functional.interpolate(h, size=o.shape[2:], mode=self.mode)
482
+ return self.c(h) + self.d(x)
483
+ ```
484
+ </details>
485
+
486
+ ### RecConv Variants
487
+
488
+ <div style="display: flex; justify-content: space-between;">
489
+ <img src="https://raw.githubusercontent.com/suous/RecNeXt/main/figures/RecConvB.png" alt="RecConvB" style="width: 49%;">
490
+ <img src="https://raw.githubusercontent.com/suous/RecNeXt/main/figures/RecConvC.png" alt="RecConvC" style="width: 49%;">
491
+ </div>
492
+
493
+
494
+ <details>
495
+ <summary>
496
+ <span style="font-size: larger; ">RecConv Variant Details</span>
497
+ </summary>
498
+
499
+ - **RecConv using group convolutions**
500
+
501
+ ```python
502
+ # RecConv Variant A
503
+ # recursive decomposition on both spatial and channel dimensions
504
+ # downsample and upsample through group convolutions
505
+ class RecConv2d(nn.Module):
506
+ def __init__(self, in_channels, kernel_size=5, bias=False, level=2):
507
+ super().__init__()
508
+ self.level = level
509
+ kwargs = {'kernel_size': kernel_size, 'padding': kernel_size // 2, 'bias': bias}
510
+ downs = []
511
+ for l in range(level):
512
+ i_channels = in_channels // (2 ** l)
513
+ o_channels = in_channels // (2 ** (l+1))
514
+ downs.append(nn.Conv2d(in_channels=i_channels, out_channels=o_channels, groups=o_channels, stride=2, **kwargs))
515
+ self.downs = nn.ModuleList(downs)
516
+
517
+ convs = []
518
+ for l in range(level+1):
519
+ channels = in_channels // (2 ** l)
520
+ convs.append(nn.Conv2d(in_channels=channels, out_channels=channels, groups=channels, **kwargs))
521
+ self.convs = nn.ModuleList(reversed(convs))
522
+
523
+ # this is the simplest modification, only support resoltions like 256, 384, etc
524
+ kwargs['kernel_size'] = kernel_size + 1
525
+ ups = []
526
+ for l in range(level):
527
+ i_channels = in_channels // (2 ** (l+1))
528
+ o_channels = in_channels // (2 ** l)
529
+ ups.append(nn.ConvTranspose2d(in_channels=i_channels, out_channels=o_channels, groups=i_channels, stride=2, **kwargs))
530
+ self.ups = nn.ModuleList(reversed(ups))
531
+
532
+ def forward(self, x):
533
+ i = x
534
+ features = []
535
+ for down in self.downs:
536
+ x, s = down(x), x.shape[2:]
537
+ features.append((x, s))
538
+
539
+ x = 0
540
+ for conv, up, (f, s) in zip(self.convs, self.ups, reversed(features)):
541
+ x = up(conv(f + x))
542
+ return self.convs[self.level](i + x)
543
+ ```
544
+
545
+ - **RecConv using channel-wise concatenation**
546
+
547
+ ```python
548
+ # recursive decomposition on both spatial and channel dimensions
549
+ # downsample using channel-wise split, followed by depthwise convolution with a stride of 2
550
+ # upsample through channel-wise concatenation
551
+ class RecConv2d(nn.Module):
552
+ def __init__(self, in_channels, kernel_size=5, bias=False, level=2):
553
+ super().__init__()
554
+ self.level = level
555
+ kwargs = {'kernel_size': kernel_size, 'padding': kernel_size // 2, 'bias': bias}
556
+ downs = []
557
+ for l in range(level):
558
+ channels = in_channels // (2 ** (l+1))
559
+ downs.append(nn.Conv2d(in_channels=channels, out_channels=channels, groups=channels, stride=2, **kwargs))
560
+ self.downs = nn.ModuleList(downs)
561
+
562
+ convs = []
563
+ for l in range(level+1):
564
+ channels = in_channels // (2 ** l)
565
+ convs.append(nn.Conv2d(in_channels=channels, out_channels=channels, groups=channels, **kwargs))
566
+ self.convs = nn.ModuleList(reversed(convs))
567
+
568
+ . # this is the simplest modification, only support resoltions like 256, 384, etc
569
+ kwargs['kernel_size'] = kernel_size + 1
570
+ ups = []
571
+ for l in range(level):
572
+ channels = in_channels // (2 ** (l+1))
573
+ ups.append(nn.ConvTranspose2d(in_channels=channels, out_channels=channels, groups=channels, stride=2, **kwargs))
574
+ self.ups = nn.ModuleList(reversed(ups))
575
+
576
+ def forward(self, x):
577
+ features = []
578
+ for down in self.downs:
579
+ r, x = torch.chunk(x, 2, dim=1)
580
+ x, s = down(x), x.shape[2:]
581
+ features.append((r, s))
582
+
583
+ for conv, up, (r, s) in zip(self.convs, self.ups, reversed(features)):
584
+ x = torch.cat([r, up(conv(x))], dim=1)
585
+ return self.convs[self.level](x)
586
+ ```
587
+ </details>
588
+
589
+ ### RecConv Beyond
590
+
591
+ We apply RecConv to [MLLA](https://github.com/LeapLabTHU/MLLA) small variants, replacing linear attention and downsampling layers.
592
+ Result in higher throughput and less training memory usage.
593
+
594
+ <pre>
595
+ mlla/logs
596
+ β”œβ”€β”€ 1_mlla_nano
597
+ β”‚ β”œβ”€β”€ <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/mlla/logs/1_mlla_nano/01_baseline.txt">01_baseline.txt</a>
598
+ β”‚ β”œβ”€β”€ <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/mlla/logs/1_mlla_nano/02_recconv_5x5_conv_trans.txt">02_recconv_5x5_conv_trans.txt</a>
599
+ β”‚ β”œβ”€β”€ <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/mlla/logs/1_mlla_nano/03_recconv_5x5_nearest_interp.txt">03_recconv_5x5_nearest_interp.txt</a>
600
+ β”‚ β”œβ”€β”€ <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/mlla/logs/1_mlla_nano/04_recattn_nearest_interp.txt">04_recattn_nearest_interp.txt</a>
601
+ β”‚ └── <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/mlla/logs/1_mlla_nano/05_recattn_nearest_interp_simplify.txt">05_recattn_nearest_interp_simplify.txt</a>
602
+ └── 2_mlla_mini
603
+ β”œβ”€β”€ <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/mlla/logs/2_mlla_mini/01_baseline.txt">01_baseline.txt</a>
604
+ β”œβ”€β”€ <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/mlla/logs/2_mlla_mini/02_recconv_5x5_conv_trans.txt">02_recconv_5x5_conv_trans.txt</a>
605
+ β”œβ”€β”€ <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/mlla/logs/2_mlla_mini/03_recconv_5x5_nearest_interp.txt">03_recconv_5x5_nearest_interp.txt</a>
606
+ β”œβ”€β”€ <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/mlla/logs/2_mlla_mini/04_recattn_nearest_interp.txt">04_recattn_nearest_interp.txt</a>
607
+ └── <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/mlla/logs/2_mlla_mini/05_recattn_nearest_interp_simplify.txt">05_recattn_nearest_interp_simplify.txt</a>
608
+ </pre>
609
+
610
+
611
+ ```bash
612
+ # this script is used to validate the ablation results
613
+ fd txt mlla/logs -x sh -c 'printf "%.2f %s
614
+ " "$(rg -N -I -U -o "EPOCH.*
615
+ .*Acc@1 (\d+\.\d+)" -r "\$1" {} | sort -n | tail -1)" "{}"' | sort -k2
616
+ ```
617
+
618
+ <details>
619
+ <summary>
620
+ <span>output</span>
621
+ </summary>
622
+
623
+ ```
624
+ 76.26 mlla/logs/1_mlla_nano/01_baseline.txt
625
+ 77.09 mlla/logs/1_mlla_nano/02_recconv_5x5_conv_trans.txt
626
+ 77.14 mlla/logs/1_mlla_nano/03_recconv_5x5_nearest_interp.txt
627
+ 76.53 mlla/logs/1_mlla_nano/04_recattn_nearest_interp.txt
628
+ 77.28 mlla/logs/1_mlla_nano/05_recattn_nearest_interp_simplify.txt
629
+ 82.27 mlla/logs/2_mlla_mini/01_baseline.txt
630
+ 82.06 mlla/logs/2_mlla_mini/02_recconv_5x5_conv_trans.txt
631
+ 81.94 mlla/logs/2_mlla_mini/03_recconv_5x5_nearest_interp.txt
632
+ 82.08 mlla/logs/2_mlla_mini/04_recattn_nearest_interp.txt
633
+ 82.16 mlla/logs/2_mlla_mini/05_recattn_nearest_interp_simplify.txt
634
+ ```
635
+ </details>
636
+
637
+ ## Limitations
638
+
639
+ 1. RecNeXt exhibits the lowest **throughput** among models of comparable parameter size due to extensive use of bilinear interpolation, which can be mitigated by employing transposed convolution.
640
+ 2. The recursive decomposition may introduce **numerical instability** during mixed precision training, which can be alleviated by using fixed-point or BFloat16 arithmetic.
641
+ 3. **Compatibility issues** with bilinear interpolation and transposed convolution on certain iOS versions may also result in performance degradation.
642
+
643
+ ## Acknowledgement
644
+
645
+ Classification (ImageNet) code base is partly built with [LeViT](https://github.com/facebookresearch/LeViT), [PoolFormer](https://github.com/sail-sg/poolformer), [EfficientFormer](https://github.com/snap-research/EfficientFormer), [RepViT](https://github.com/THU-MIG/RepViT), and [MogaNet](https://github.com/Westlake-AI/MogaNet).
646
+
647
+ The detection and segmentation pipeline is from [MMCV](https://github.com/open-mmlab/mmcv) ([MMDetection](https://github.com/open-mmlab/mmdetection) and [MMSegmentation](https://github.com/open-mmlab/mmsegmentation)).
648
+
649
+ Thanks for the great implementations!
650
+
651
  ## Citation
652
 
653
+ If our code or models help your work, please cite our papers and give us a star 🌟!
654
+
655
  ```BibTeX
656
  @misc{zhao2024recnext,
657
  title={RecConv: Efficient Recursive Convolutions for Multi-Frequency Representations},