DRCT: Dense-residual-connected Transformer for Image Super-Resolution

Original Model Authors: Chih-Chung Hsu, Chia-Ming Lee, Yi-Shiuan Chou

This repository contains the model DRCT (Dense-residual-connected Transformer) which makes low-resolution images sharp and high-res (that's Single Image Super-Resolution or SISR).

It uses the efficient Swin Transformer but is designed to fix a common problem in deep networks called the "information bottleneck." This is where fine image details can get lost as data goes through many layers.

DRCT uses dense-residual connections which help the network hold onto and reuse important details learned earlier. This keeps the process stable and preserves fine textures much better.

Therefore, DRCT delivers state-of-the-art image quality on standard tests (like Set5, Urban100) for various zoom levels (x2, x3, x4). Plus, it's often more computationally efficient than other leading models like HAT, meaning it can do a great job without needing quite as much processing power.

How to use

You can find all the steps on how to use these models on your local PC using Python scripts on the following GitHub repository:

https://github.com/aaronespasa/image-upscaler-ai/tree/main/test

Output example

Model Variants and Complexity for x4 SISR**

Model Variant	Parameters	FLOPs	Fwd/Bwd Pass Memory
DRCT (Base)	14.13 M	7.92 G	1857.55 M
DRCT-L	27.58 M	11.07 G	4278.19 M

The primary model weights included in this repository correspond to the Real-DRCT-GAN_Finetuned from MSE version mentioned in the original project.

This means the model was initially trained for pixel-level accuracy (using MSE loss) and then fine-tuned using GAN techniques. This second stage focuses on improving visual realism, sharpness, and fine details, making this the generally preferred version for achieving the most convincing high-resolution results.

Training Data

Pre-training: ImageNet
Fine-tuning: DF2K dataset (combined DIV2K + Flickr2K)

Resources

Paper: DRCT: Saving Image Super-Resolution away from Information Bottleneck (arXiv:2404.00722v5)
Original Code: https://github.com/ming053l/DRCT

Limitations

Performance may vary on different image types or degradations. Like other generative models, it might hallucinate details. Potential biases reflect the training data.

License

MIT License. See the license file in the original repository: LICENSE

Citation

If you use this model, please cite the original work:

@misc{hsu2024drct,
      title={DRCT: Saving Image Super-Resolution away from Information Bottleneck},
      author={Chih-Chung Hsu and Chia-Ming Lee and Yi-Shiuan Chou},
      year={2024},
      eprint={2404.00722},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}