DRCT: Dense-residual-connected Transformer for Image Super-Resolution
Original Model Authors: Chih-Chung Hsu, Chia-Ming Lee, Yi-Shiuan Chou
This repository contains the model DRCT (Dense-residual-connected Transformer) which makes low-resolution images sharp and high-res (that's Single Image Super-Resolution or SISR).
It uses the efficient Swin Transformer but is designed to fix a common problem in deep networks called the "information bottleneck." This is where fine image details can get lost as data goes through many layers.
DRCT uses dense-residual connections which help the network hold onto and reuse important details learned earlier. This keeps the process stable and preserves fine textures much better.
Therefore, DRCT delivers state-of-the-art image quality on standard tests (like Set5, Urban100) for various zoom levels (x2, x3, x4). Plus, it's often more computationally efficient than other leading models like HAT, meaning it can do a great job without needing quite as much processing power.
How to use
You can find all the steps on how to use these models on your local PC using Python scripts on the following GitHub repository:
https://github.com/aaronespasa/image-upscaler-ai/tree/main/test
Output example
Model Variants and Complexity for x4 SISR**
Model Variant | Parameters | FLOPs | Fwd/Bwd Pass Memory |
---|---|---|---|
DRCT (Base) | 14.13 M | 7.92 G | 1857.55 M |
DRCT-L | 27.58 M | 11.07 G | 4278.19 M |
The primary model weights included in this repository correspond to the Real-DRCT-GAN_Finetuned from MSE version mentioned in the original project.
This means the model was initially trained for pixel-level accuracy (using MSE loss) and then fine-tuned using GAN techniques. This second stage focuses on improving visual realism, sharpness, and fine details, making this the generally preferred version for achieving the most convincing high-resolution results.
Training Data
- Pre-training: ImageNet
- Fine-tuning: DF2K dataset (combined DIV2K + Flickr2K)
Resources
- Paper: DRCT: Saving Image Super-Resolution away from Information Bottleneck (arXiv:2404.00722v5)
- Original Code: https://github.com/ming053l/DRCT
Limitations
Performance may vary on different image types or degradations. Like other generative models, it might hallucinate details. Potential biases reflect the training data.
License
MIT License. See the license file in the original repository: LICENSE
Citation
If you use this model, please cite the original work:
@misc{hsu2024drct,
title={DRCT: Saving Image Super-Resolution away from Information Bottleneck},
author={Chih-Chung Hsu and Chia-Ming Lee and Yi-Shiuan Chou},
year={2024},
eprint={2404.00722},
archivePrefix={arXiv},
primaryClass={cs.CV}
}