South-Central Minzu University,a. College of Computer Science; b. College of Electronic and Information Engineering; c. Hubei Key Lab of Intelligent Wireless Communication,Wuhan 430074,China
The effective combination of Convolution Neural Network (CNN) which extract the local correlation features of images and Vision Transformer (ViT) which focuses on capturing the remote dependence of images can improve the quality of image reconstruction. A network of image super-resolution based on feature enhancement with ViT-CNN is studied. Specifically, the network includes ViT-based SR branch and CNN-based gradient branch, which extract the global correlation in the image feature domain and the local dependency in the image gradient domain respectively. Through the fusion and gradual enhancement of the two kinds of information, the reconstructed image with large factor is obtained. In addition, by introducing gradient loss and progressive training strategy, the difficulty of training is effectively reduced and the stability of training is enhanced. A large number of experimental results on multiple public datasets demonstrate the effectiveness of the proposed method in improving the performance of the reconstruction system.
ZHANGL, ZHANGH, SHENH, et al. A super-resolution reconstruction algorithm for surveillance images[J]. Signal Processing, 2010, 90(3): 848-859.
[2]
GREENSPANH. Super-resolution in medical imaging[J]. The Computer Journal, 2009, 52(1): 43-63.
[3]
SHERMEYERJ, VAN ETTENA. The effects of super-resolution on object detection performance in satellite imagery[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Long Beach: IEEE, 2019: 1432-1441.
[4]
DONGC, LOYC C, HEK, et al. Learning a deep convolutional network for image super-resolution[C]//European Conference on Computer Vision. Zurich: Springer, 2014: 184-199.
[5]
DONGC, LOYC C, HEK, et al. Image super-resolution using deep convolutional networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(2): 295-307.
[6]
LAIW S, HUANGJ B, AHUJAN, et al. Deep Laplacian pyramid networks for fast and accurate super-resolution[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017: 5835-5843.
[7]
LIM B, SON S, KIMH, et al. Enhanced deep residual networks for single image super-resolution[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Honolulu: IEEE, 2017: 1132-1140.
[8]
LAIW S, HUANGJ B, AHUJAN, et al. Fast and accurate image super-resolution with deep Laplacian pyramid networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(11): 2599-2613.
[9]
PARKS, YOOJ, CHOD, et al. Fast adaptation to super-resolution networks via meta-learning[C]//European Conference on Computer Vision. Glasgow: Springer, 2020: 754-769.
LuZ, LiJ, LiuH, et al. Transformer for single image super-resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 456-465.
[16]
LIANGJ, CAOJ, SUNG, et al. SwinIR: Image restoration using swin transformer[C]//2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). Montreal: IEEE, 2021: 1833-1844.
[17]
MAC, RAOY, CHENGY, et al. Structure-preserving super resolution with gradient guidance[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE, 2020: 7766-7775.
[18]
LIUZ, LINY, CAOY, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal: IEEE, 2021: 9992-10002.
[19]
AGUSTSSONE, TIMOFTER. NTIRE 2017 challenge on single image super-resolution: Dataset and study[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Honolulu: IEEE, 2017: 1122-1131.
[20]
BEVILACQUAM, ROUMYA, GUILLEMOTC, et al. Low-complexity single-image super-resolution based on nonnegative neighbor embedding[C]//Electronic Proceedings of the British Machine Vision Conference. Surrey: BMVC, 2012: 1-10.
[21]
ZEYDER, ELADM, PROTTERM. On single image scale-up using sparse-representations[C]//Curves and Surfaces. International Conference on Curves and Surfaces. Avignon: Springer, 2010: 711-730.
[22]
MARTIND, FOWLKESC, TAL D, et al. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics[C]//IEEE International Conference on Computer Vision. Vancouver: IEEE, 2001: 416-423.
[23]
MATSUIY, ITOK, ARAMAKIY, et al. Sketch-based manga retrieval using manga109 dataset[J]. Multimedia Tools and Application, 2017, 76: 21811-21838.