Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -16,7 +16,7 @@ tags:
|
|
16 |
- transformers
|
17 |
---
|
18 |
|
19 |
-
# Model Card for RecNeXt-
|
20 |
|
21 |
## Abstract
|
22 |
Recent advances in vision transformers (ViTs) have demonstrated the advantage of global modeling capabilities, prompting widespread integration of large-kernel convolutions for enlarging the effective receptive field (ERF). However, the quadratic scaling of parameter count and computational complexity (FLOPs) with respect to kernel size poses significant efficiency and optimization challenges. This paper introduces RecConv, a recursive decomposition strategy that efficiently constructs multi-frequency representations using small-kernel convolutions. RecConv establishes a linear relationship between parameter growth and decomposing levels which determines the effective receptive field $k\times 2^\ell$ for a base kernel $k$ and $\ell$ levels of decomposition, while maintaining constant FLOPs regardless of the ERF expansion. Specifically, RecConv achieves a parameter expansion of only $\ell+2$ times and a maximum FLOPs increase of $5/3$ times, compared to the exponential growth ($4^\ell$) of standard and depthwise convolutions. RecNeXt-M3 outperforms RepViT-M1.1 by 1.9 $AP^{box}$ on COCO with similar FLOPs. This innovation provides a promising avenue towards designing efficient and compact networks across various modalities. Codes and models can be found at https://github.com/suous/RecNeXt.
|
@@ -32,17 +32,17 @@ Recent advances in vision transformers (ViTs) have demonstrated the advantage of
|
|
32 |
## Model Details
|
33 |
|
34 |
- **Model Type**: Image Classification / Feature Extraction
|
35 |
-
- **Model Series**:
|
36 |
- **Model Stats**:
|
37 |
-
- **Parameters**:
|
38 |
-
- **MACs**:
|
39 |
-
- **Latency**:
|
40 |
- **Image Size**: 224x224
|
41 |
|
42 |
- **Architecture Configuration**:
|
43 |
-
- **Embedding Dimensions**:
|
44 |
-
- **Depths**:
|
45 |
-
- **MLP Ratio**: 2
|
46 |
|
47 |
- **Paper**: [RecConv: Efficient Recursive Convolutions for Multi-Frequency Representations](https://arxiv.org/abs/2412.19628)
|
48 |
|
@@ -346,14 +346,14 @@ python publish.py --model_name recnext_m1 --checkpoint_path pretrain/checkpoint_
|
|
346 |
## Downstream Tasks
|
347 |
[Object Detection and Instance Segmentation](https://github.com/suous/RecNeXt/blob/main/detection/README.md)<br>
|
348 |
|
349 |
-
| model
|
350 |
-
|
351 |
-
|
|
352 |
-
|
|
353 |
-
|
|
354 |
-
|
|
355 |
-
|
|
356 |
-
|
|
357 |
|
358 |
[Semantic Segmentation](https://github.com/suous/RecNeXt/blob/main/segmentation/README.md)
|
359 |
|
@@ -552,6 +552,11 @@ class RecConv2d(nn.Module):
|
|
552 |
We apply RecConv to [MLLA](https://github.com/LeapLabTHU/MLLA) small variants, replacing linear attention and downsampling layers.
|
553 |
Result in higher throughput and less training memory usage.
|
554 |
|
|
|
|
|
|
|
|
|
|
|
555 |
<pre>
|
556 |
mlla/logs
|
557 |
βββ 1_mlla_nano
|
@@ -567,6 +572,7 @@ mlla/logs
|
|
567 |
βββ <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/mlla/logs/2_mlla_mini/04_recattn_nearest_interp.txt">04_recattn_nearest_interp.txt</a>
|
568 |
βββ <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/mlla/logs/2_mlla_mini/05_recattn_nearest_interp_simplify.txt">05_recattn_nearest_interp_simplify.txt</a>
|
569 |
</pre>
|
|
|
570 |
|
571 |
## Limitations
|
572 |
|
@@ -576,7 +582,7 @@ mlla/logs
|
|
576 |
|
577 |
## Acknowledgement
|
578 |
|
579 |
-
Classification (ImageNet) code base is partly built with [LeViT](https://github.com/facebookresearch/LeViT), [PoolFormer](https://github.com/sail-sg/poolformer), [EfficientFormer](https://github.com/snap-research/EfficientFormer), [RepViT](https://github.com/THU-MIG/RepViT), [LSNet](https://github.com/jameslahm/lsnet), and [MogaNet](https://github.com/Westlake-AI/MogaNet).
|
580 |
|
581 |
The detection and segmentation pipeline is from [MMCV](https://github.com/open-mmlab/mmcv) ([MMDetection](https://github.com/open-mmlab/mmdetection) and [MMSegmentation](https://github.com/open-mmlab/mmsegmentation)).
|
582 |
|
|
|
16 |
- transformers
|
17 |
---
|
18 |
|
19 |
+
# Model Card for RecNeXt-T (With Knowledge Distillation)
|
20 |
|
21 |
## Abstract
|
22 |
Recent advances in vision transformers (ViTs) have demonstrated the advantage of global modeling capabilities, prompting widespread integration of large-kernel convolutions for enlarging the effective receptive field (ERF). However, the quadratic scaling of parameter count and computational complexity (FLOPs) with respect to kernel size poses significant efficiency and optimization challenges. This paper introduces RecConv, a recursive decomposition strategy that efficiently constructs multi-frequency representations using small-kernel convolutions. RecConv establishes a linear relationship between parameter growth and decomposing levels which determines the effective receptive field $k\times 2^\ell$ for a base kernel $k$ and $\ell$ levels of decomposition, while maintaining constant FLOPs regardless of the ERF expansion. Specifically, RecConv achieves a parameter expansion of only $\ell+2$ times and a maximum FLOPs increase of $5/3$ times, compared to the exponential growth ($4^\ell$) of standard and depthwise convolutions. RecNeXt-M3 outperforms RepViT-M1.1 by 1.9 $AP^{box}$ on COCO with similar FLOPs. This innovation provides a promising avenue towards designing efficient and compact networks across various modalities. Codes and models can be found at https://github.com/suous/RecNeXt.
|
|
|
32 |
## Model Details
|
33 |
|
34 |
- **Model Type**: Image Classification / Feature Extraction
|
35 |
+
- **Model Series**: L
|
36 |
- **Model Stats**:
|
37 |
+
- **Parameters**: 12.1M
|
38 |
+
- **MACs**: 0.3G
|
39 |
+
- **Latency**: 1.8ms (iPhone 13, iOS 18)
|
40 |
- **Image Size**: 224x224
|
41 |
|
42 |
- **Architecture Configuration**:
|
43 |
+
- **Embedding Dimensions**: (64, 128, 256, 512)
|
44 |
+
- **Depths**: (0, 2, 8, 10)
|
45 |
+
- **MLP Ratio**: (2, 2, 2, 1.5)
|
46 |
|
47 |
- **Paper**: [RecConv: Efficient Recursive Convolutions for Multi-Frequency Representations](https://arxiv.org/abs/2412.19628)
|
48 |
|
|
|
346 |
## Downstream Tasks
|
347 |
[Object Detection and Instance Segmentation](https://github.com/suous/RecNeXt/blob/main/detection/README.md)<br>
|
348 |
|
349 |
+
| model | $AP^b$ | $AP_{50}^b$ | $AP_{75}^b$ | $AP^m$ | $AP_{50}^m$ | $AP_{75}^m$ | Latency | Ckpt | Log |
|
350 |
+
|:------|:------:|:-----------:|:-----------:|:------:|:-----------:|:-----------:|:-------:|:---------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------:|
|
351 |
+
| M3 | 41.7 | 63.4 | 45.4 | 38.6 | 60.5 | 41.4 | 5.2ms | [M3](https://github.com/suous/RecNeXt/releases/download/v1.0/recnext_m3_coco.pth) | [M3](https://raw.githubusercontent.com/suous/RecNeXt/main/detection/logs/recnext_m3_coco.json) |
|
352 |
+
| M4 | 43.5 | 64.9 | 47.7 | 39.7 | 62.1 | 42.4 | 7.6ms | [M4](https://github.com/suous/RecNeXt/releases/download/v1.0/recnext_m4_coco.pth) | [M4](https://raw.githubusercontent.com/suous/RecNeXt/main/detection/logs/recnext_m4_coco.json) |
|
353 |
+
| M5 | 44.6 | 66.3 | 49.0 | 40.6 | 63.5 | 43.5 | 12.4ms | [M5](https://github.com/suous/RecNeXt/releases/download/v1.0/recnext_m5_coco.pth) | [M5](https://raw.githubusercontent.com/suous/RecNeXt/main/detection/logs/recnext_m5_coco.json) |
|
354 |
+
| A3 | 42.1 | 64.1 | 46.2 | 38.8 | 61.1 | 41.6 | 8.3ms | [A3](https://github.com/suous/RecNeXt/releases/download/v2.0/recnext_a3_coco.pth) | [A3](https://raw.githubusercontent.com/suous/RecNeXt/main/detection/logs/recnext_a3_coco.json) |
|
355 |
+
| A4 | 43.5 | 65.4 | 47.6 | 39.8 | 62.4 | 42.9 | 14.0ms | [A4](https://github.com/suous/RecNeXt/releases/download/v2.0/recnext_a4_coco.pth) | [A4](https://raw.githubusercontent.com/suous/RecNeXt/main/detection/logs/recnext_a4_coco.json) |
|
356 |
+
| A5 | 44.4 | 66.3 | 48.9 | 40.3 | 63.3 | 43.4 | 25.3ms | [A5](https://github.com/suous/RecNeXt/releases/download/v2.0/recnext_a5_coco.pth) | [A5](https://raw.githubusercontent.com/suous/RecNeXt/main/detection/logs/recnext_a5_coco.json) |
|
357 |
|
358 |
[Semantic Segmentation](https://github.com/suous/RecNeXt/blob/main/segmentation/README.md)
|
359 |
|
|
|
552 |
We apply RecConv to [MLLA](https://github.com/LeapLabTHU/MLLA) small variants, replacing linear attention and downsampling layers.
|
553 |
Result in higher throughput and less training memory usage.
|
554 |
|
555 |
+
<details>
|
556 |
+
<summary>
|
557 |
+
<span style="font-size: larger; ">Ablation Logs</span>
|
558 |
+
</summary>
|
559 |
+
|
560 |
<pre>
|
561 |
mlla/logs
|
562 |
βββ 1_mlla_nano
|
|
|
572 |
βββ <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/mlla/logs/2_mlla_mini/04_recattn_nearest_interp.txt">04_recattn_nearest_interp.txt</a>
|
573 |
βββ <a style="text-decoration:none" href="https://raw.githubusercontent.com/suous/RecNeXt/main/mlla/logs/2_mlla_mini/05_recattn_nearest_interp_simplify.txt">05_recattn_nearest_interp_simplify.txt</a>
|
574 |
</pre>
|
575 |
+
</details>
|
576 |
|
577 |
## Limitations
|
578 |
|
|
|
582 |
|
583 |
## Acknowledgement
|
584 |
|
585 |
+
Classification (ImageNet) code base is partly built with [LeViT](https://github.com/facebookresearch/LeViT), [PoolFormer](https://github.com/sail-sg/poolformer), [EfficientFormer](https://github.com/snap-research/EfficientFormer), [RepViT](https://github.com/THU-MIG/RepViT), [LSNet](https://github.com/jameslahm/lsnet), [MLLA](https://github.com/LeapLabTHU/MLLA), and [MogaNet](https://github.com/Westlake-AI/MogaNet).
|
586 |
|
587 |
The detection and segmentation pipeline is from [MMCV](https://github.com/open-mmlab/mmcv) ([MMDetection](https://github.com/open-mmlab/mmdetection) and [MMSegmentation](https://github.com/open-mmlab/mmsegmentation)).
|
588 |
|