In order to reduce the impact of video coding distortion on object detection, an optimization method of ideo coding for object detection and perceptual quality was proposed. Firstly, the quantization parameter of I frame was refined to improve the video coding performance in terms of rate-compression-distortion. Secondly, the object detection algorithm was introduced into video codec to predict the object area of current coding frame. Thirdly, a commonly used deep neural network was utilized to extract the feature of current coding unit, which was used to calculate feature distortion. Then, a modified VGG model was proposed to predict the quantization parameter of current coding unit. Finally, the feature distortion and compression distortion were considered as joint distortion in rate-distortion optimization problem, in which the optimal coding parameters were decided. Experimental results showed that, compared with VTM-23.0, the proposed method could achieve about 10.5% BD-rate savings on object detection accuracy and about 2.2% BD-rate savings on compression distortion, respectively.
所提算法面向人机混合的视频编码,为了验证算法有效性,实验结果分别展示编码后目标检测性能和人眼视觉质量,两者同时用BD-rate衡量,即对比在相同的目标检测性能或人眼视觉质量下的比特消耗,BD-rate为负值说明算法有增益。其中目标检测性能采用IoU(Intersection over union)为50%条件下的mAP度量,人眼视觉质量采用PSNR度量。
BROSSB, WANGYekui, YEYan, et al. Overview of the versatile video coding (VVC) standard and its applications[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(10): 3736-3764.
[2]
ZHENGXiaozhen, LIAOQingmin, WANGYueming, et al. Performance evaluation for AVS3 video coding standard[C]//2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), London, 2020: 1-6.
[3]
DUANLingyu, LOUYihang, BAIYan, et al. Compact descriptors for video analysis: The emerging MPEG standard[J]. IEEE MultiMedia, 2019, 26(2): 44-54.
[4]
DUANLingyu, LIUJiaying, YANGWenhan, et al. Video coding for machines: A paradigm of collaborative compression and intelligent analytics[J]. IEEE Transactions on Image Processing, 2020, 29: 8680-8695.
[5]
LIXin, SHIJun, CHENZhibo. Task-driven semantic coding via reinforcement learning[J]. IEEE Transactions on Image Processing, 2021, 30: 6307-6320.
LIRan, HAOPeinan, SUNFengyuan. A bidirectional motion estimation based frame rate up-conversion using context cube matching[J]. Journal of Xinyang Normal University (Natural Science Edition), 2022, 35(4): 638-644.
LIYanling, WANGShasha, YANGZhipeng. An improved multi-task cascade convolution neural network face detection algorithm[J]. Journal of Xinyang Normal University (Natural Science Edition), 2022, 35(4): 651-655.
XIAOLizhi, ZHANGZheng. Pedestrian detection method based on recurrent convolutional neural networks[J]. Journal of Xinyang Normal University (Natural Science Edition), 2021, 34(4): 655-660.
[12]
ASCENSOJ, ALSHINAE, EBRAHIMIT. The JPEG AI standard: Providing efficient human and machine visual data consumption[J]. IEEE MultiMedia, 2023, 30(1): 100-111.
[13]
XIASifeng, LIANGKunchangai, YANGWenhan, et al. An emerging coding paradigm Vcm: A scalable coding approach beyond feature and signal[C]//2020 IEEE International Conference on Multimedia and Expo (ICME), London, 2020: 1-6.
[14]
CHOIH, BAJIĆI V. Scalable video coding for humans and machines[C]//2022 IEEE 24th International Workshop on Multimedia Signal Processing (MMSP), Shanghai, 2022: 1-6.
[15]
YANGWenhan, HUANGHaofeng, HUYueyu, et al. Video coding for machines: Compact visual representation compression for intelligent collaborative analytics[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(7): 5174-5191.
[16]
SHENGXihua, LILi, LIUDong, et al. VNVC: A versatile neural video coding framework for efficient human-machine vision[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(7): 4579-4596.
[17]
LIUYuyang, ZHUCe, MAOMin, et al. Video analytical coding: When video coding meets video analysis[J]. Signal Processing: Image Communication, 2018, 67: 48-57.
[18]
RENShaoqing, HEKaiming, GIRSHICKR, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[19]
CARIONN, MASSAF, SYNNAEVEG, et al. End-to-end object detection with transformers[C]//Computer Vision‑ECCV 2020, Cham, 2020: 213-229.
[20]
PADILLAR, NETTOS L, SILVAE A BDA. A survey on performance metrics for object-detection algorithms[C]//2020 International Conference on Systems, Signals and Image Processing (IWSSIP), Niteroi, 2020: 237-242.
[21]
BROSSB, CHENJ, LIUS, et al. Versatile video coding [S]. ITU-T/ISO/IEC Joint Video Exploration Team(JVET), 19th Meeting by Teleconference, 2020.
[22]
LIShuai, ZHUCe, GAOYanbo, et al. Lagrangian multiplier adaptation for rate-distortion optimization with inter-frame dependency[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2016, 26(1): 117-129.
[23]
FERRYMANJ, SHAHROKNIA. PETS2009: Dataset and challenge[C]//2009 Twelfth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance, Snowbird, 2009: 1-6.
[24]
MILANA, LEAL-TAIXEL, REIDI, et al. MOT16: A benchmark for multi-object tracking[EB/OL]. (2016-05-03) [2023-10-22].