Facing complex video scenes with large motion amplitude, real-time video super-resolution algorithms are difficult to reconstruct texture details and occluded regions, based on generative adversarial network, a real-time video super-resolution method based on efficient gating and target region attention is proposed. The method firstly uses an efficient gating reconstruction network as a generative network to maintain high efficiency while adaptively selecting complex region information through a simplified gating mechanism to enhance the reconstruction results. Further, the method proposes a target region attention discriminative network to provide multiscale and spatio-temporal information feedback for the generative network, and acquires multiscale information of the complex video through the multiscale mechanism and ReLU linear attention; through the significant spatio-temporal discriminative module, it restricts the discriminative network to focus on the complex regions, and better acquires the spatio-temporal information of the complex regions. The experimental results show that the proposed method exhibits significant superiority over other SOTA algorithms, the model efficiency achieves an inference delay of 13.36 ms and a real-time score of 65.806, which indicates the efficient real-time performance of the model.
视频超分辨率(Video super resolution,VSR)是计算机视觉的核心问题之一,其目标是从给定的低分辨率(Low resolution,LR)视频序列中重建出高分辨率(High resolution,HR)视频序列。然而,在大运动幅度的复杂视频场景,如交通车流、体育赛事、电影场景等,存在大量纹理细节及遮挡区域,VSR模型难以获取这些视频场景的特征信息,无法生成高质量视频,这是VSR所面临的挑战之一。
实验中,从3个方面评估模型性能:重建结果的保真度、感知质量以及模型效率。对于重建结果的保真度,采用峰值信噪比(Peak signal to noise ratio,PSNR)和结构相似度(Structural similarity,SSIM)作为评估标准;对于重建结果的感知质量,采用学习感知图像块相似度(Learned perceptual image patch similarity,LPIPS)作为评估标准;对于实时VSR的模型效率,采用模型参数、推理延迟、浮点运算数(FLOPs)以及权衡得分函数score[23]作为评估标准,其中推理延迟表示模型进行VSR所需时间,score用于客观衡量模型在实时下的性能,二者在实时VSR效率评估中极为重要,score定义如式(20)所示:
ClaudioRota, MarcoBuzzelli, SimoneBianco, et al. Video restoration based on deep learning: a comprehensive survey[J]. Artificial Intelligence Review, 2023, 56(6): 5317-5364.
[2]
CaballeroJ, LedigC, AitkenA P, et al. Real-time video super-resolution with spatio-temporal networks and motion compensation[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 4778-4787.
[3]
VemulapalliR, BrownM, MehdiS M. Frame-recurrent video super-resolution[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA,2018: 6626-6634.
[4]
WangL, GuoY, LiuL, et al. Deep video super-resolution using HR optical flow estimation[J].IEEE Transactions on Image Processing, 2020, 29: 4323-4336.
[5]
ChanK C K, WangX, YuK, et al. Basicvsr: the search for essential components in video super-resolution and beyond[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, TN, USA, 2021: 4947-4956.
[6]
ChanK C K, ZhouS, XuX,et al. Basicvsr++: improving video super-resolution with enhanced propagation and alignment[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA,2022: 5972-5981.
[7]
OuyangNing, Zhi-shanOu, LinLe-ping. Video super-resolution network with gated high-low resolution frames[J]. Applied Sciences, 2023, 13(14): 1-16.
[8]
ZhouX, ZhangL, ZhaoX,et al. Video super-resolution transformer with masked inter & intra-frame attention[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024: 25399-25408.
[9]
CaoY, WangC, SongC, et al. Real-time super-resolution system of 4k-video based on deep learning[C]∥2021 IEEE 32nd International Conference on Application-Specific Systems, Architectures and Processors(ASAP), NJ, USA, 2021: 69-76
[10]
XiaBin, HeJing-wen, ZhangYu-lun, et al. Structured sparsity learning for efficient video super-resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada. 2023: 22638-22647
[11]
XiaoJun, JiangXin-yang, ZhengNing-xin, et al.Online video super-resolution with convolutional kernel bypass grafts[J]. IEEE Transactions on Multimedia, 2023, 25: 8972-8987.
[12]
LiGen, JiJie, QinMing-hai, et al. Towards high-quality and efficient video super-resolution via spatial-temporal data overfitting[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Vancouver, Canada. 2023: 10259-10269.
[13]
ChuMeng-yu, XieYou, LauraLeal-Taixé, et al. Temporally coherent gans for video super-resolution (tecogan)[J]. arXiv Preprint, 2018, 1(2): 3.
[14]
ChenRui, MuYang, ZhangYan. High-order relational generative adversarial network for video super-resolution[J]. Pattern Recognition, 2024, 146: 110059.
[15]
ChenL, ChuX, ZhangX,et al. Simple baselines for image restoration[C]∥European Conference on Computer Vision. Cham: Springer, 2022: 17-33.
[16]
DingX, ZhangX, HanJ, et al. Diverse branch block: Building a convolution as an inception-like unit[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, TN, USA, 2021: 10881-10890,.
[17]
CaiH, LiJ, GanC, et al. Efficientvit: Lightweight multi-scale attention for high-resolution dense prediction[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Paris, France. 2023: 17302-17313.
[18]
OuyangD, HeS, ZhanJ, et al. Efficient multi-scale attention module with cross-spatial learning[C]∥ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). ArXiv, 2023, abs/2305.13563.
[19]
XueTian-fan, ChenBai-an, WuJia-jun, et al. Video enhancement with task-oriented flow[J]. International Journal of Computer Vision, 2019, 127: 1106-1125.
[20]
LiuCe, SunDe-qing. On Bayesian adaptive video super resolution[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 36(2): 346-360.
[21]
YiP, WangZ, JiangK,et al. Progressive fusion video super-resolution network via exploiting non-local spatio-temporal correlations[C]∥Proceedings of the IEEE/CVF International Conference on Computer Vision,Seoul, Korea (South), 2019: 3106-3115.
[22]
NahS, BaikS, HongS,et al. Ntire 2019 challenge on video deblurring and super-resolution: Dataset and study[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long B, CA, USA, 2019, 1996-2005.
[23]
IgnatovA, RomeroA, KimH,et al. Real-time video super-resolution on smartphones with deep learning, Mobile AI2021 challenge: Report[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 2021: 2535-2544.