In response to the problem of low detection precision caused by inconsistent size, high background noise, and large-scale changes in X-ray image prohibited item, the optimization is performed based on RT-DETR-R18 and an X-ray image prohibited item detection algorithm named X-ray-RTDETR is proposed. Firstly, the algorithm employs CSPRepResNet embedded with efficient multi-scale attention as the backbone network to enhance feature extraction capabilities. Secondly, the simplified fast spatial pyramid pooling module is introduced after the three features maps output by the backbone network to improve the robustness and generalization ability of the model. Finally, the SPoolFormer encoder is applied to high-level feature maps with richer semantic concepts for intra-scale feature interaction. The experimental results show that the detection accuracy of X-ray-RTDETR achieves 74.6% on PIDray test set, surpassing RT-DETR-R18 by 8.5%, while reducing the number of parameters and nFLOP by 1.67×106 and 2.24×109, respectively. Compared to the state-of-the-art object detection algorithms at the same scale shows that X-ray-RTDETR not only has higher detection accuracy, but also has less number of parameters and nFLOP. At the same time, its inference speed reaches 85.47 frames per second on RTX2070 Max-Q GPU.
本文分别在3个测试子集和全部测试集上使用平均精度(average precision, AP)指标评估模型的检测精度,以体现模型针对不同难度样本的检测能力.其中使用AP50表示当交并比(intersection over union,IoU)等于0.5时12种违禁品的平均检测精度,使用AP表示IoU从0.5到0.95,以0.05为步长取值计算平均检测精度,取10次计算结果的平均值.
GirshickR, DonahueJ, DarrellT, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C] // 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 580-587.
[2]
GirshickR. Fast R-CNN[C] // 2015 IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 1440-1448.
[3]
RenS Q, HeK M, GirshickR, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[4]
HeK M, GkioxariG, DollárP, et al. Mask R-CNN[C] // 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2980-2988.
[5]
GausY F A, BhowmikN, BreckonT P. On the use of deep learning for the detection of firearms in X-ray baggage security imagery[C] // 2019 IEEE International Symposium on Technologies for Homeland Security(HST). Woburn: IEEE, 2019: 1-7.
[6]
MaC J, ZhuoL, LiJ F, et al. Prohibited object detection in X-ray images with dynamic deformable convolution and adaptive IoU[C] // 2022 IEEE International Conference on Image Processing(ICIP). Bordeaux: IEEE, 2022: 3001-3005.
[7]
LiaoH Y, HuangB, GaoH X. Feature-aware prohibited items detection for X-ray images[C] // 2023 IEEE International Conference on Image Processing(ICIP). Kuala Lumpur: IEEE, 2023: 1040-1044.
[8]
LiuW, AnguelovD, ErhanD, et al. SSD: single shot multibox detector[C] // 2016 European Conference on Computer Vision (ECCV). Berlin: Springer, 2016: 21-37.
[9]
RedmonJ, FarhadiA. YOLOv3: an incremental improvement[EB/OL]. (2018-04-08) [2023-10-19].
[10]
BochkovskiyA, WangC Y, LiaoH M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. (2020-04-23) [2023-10-19].
[11]
LiC Y, LiL L, JiangH L, et al. YOLOv6: a single-stage object detection framework for industrial applications[EB/OL]. (2022-09-07) [2023-10-19].
[12]
LiC Y, LiL L, GengY F, et al. YOLOv6 v3.0: a full-scale reloading[EB/OL]. (2023-01-13) [2023-10-19].
[13]
WangC Y, BochkovskiyA, LiaoH M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Vancouver: IEEE, 2023: 7464-7475.
[14]
WeiY J, DaiC, ChenM S, et al. Prohibited items detection in X-ray images in YOLO network[C] // 2021 26th International Conference on Automation and Computing(ICAC). Portsmouth: IEEE, 2021: 1-6.
[15]
WangZ S, ZhangH Y, LinZ B, et al. Prohibited items detection in baggage security based on improved YOLOv5[C] // 2022 IEEE 2nd International Conference on Software Engineering and Artificial Intelligence(SEAI). Xiamen, 2022: 20-25.
[16]
LiuW, SunD G, WangY, et al. ABTD-Net: autonomous baggage threat detection networks for X-ray images[C] // 2023 IEEE International Conference on Multimedia and Expo(ICME). Brisbane: IEEE, 2023: 1229-1234.
[17]
CarionN, MassaF, SynnaeveG, et al. End-to-end object detection with Transformers[C] // 2020 European Conference on Computer Vision(ECCV). Cham: Springer, 2020: 213-229.
[18]
ZhuX Z, SuW J, LuL W, et al. Deformable DETR: deformable Transformers for end-to-end object detection[EB/OL]. (2021-03-18) [2023-10-19].
[19]
MengD P, ChenX K, FanZ J, et al. Conditional DETR for fast training convergence[C] // 2021 IEEE/CVF International Conference on Computer Vision(ICCV). Montreal: IEEE, 2021: 3651-3660.
[20]
LiF, ZhangH, LiuS L, et al. DN-DETR: accelerate DETR training by introducing query denoising[C] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). New Orleans: IEEE, 2022: 13609-13617.
[21]
ZhangH, LiF, LiuS L, et al. DINO: DETR with improved denoising anchor boxes for end-to-end object detection[C] // The 11th International Conference on Learning Representations. Kigali, 2023: 1-19.
[22]
ZhaoY A, LyuW Y, XuS L, et al. DETRs beat YOLOs on real-time object detection[C] // 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Seattle: IEEE, 2024: 16965-16974.
[23]
OuyangD L, HeS, ZhangG Z, et al. Efficient multi-scale attention module with cross-spatial learning[C] // 2023 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). Rhodes Island, Greece: IEEE, 2023: 1-5.
[24]
XuS L, WangX X, LyuW Y, et al. PP-YOLOE: an evolved version of YOLO[EB/OL]. (2022-12-12) [2023-10-19].
[25]
HeK M, ZhangX Y, RenS Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916.
[26]
YuW H, LuoM, ZhouP, et al. MetaFormer is actually what you need for vision[C] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). New Orleans: IEEE, 2022: 10809-10819.
[27]
StergiouA, PoppeR, KalliatakisG. Refining activation downsampling with SoftPool[C] // 2021 IEEE/CVF International Conference on Computer Vision(ICCV). Montreal: IEEE, 2021: 10337-10346.
[28]
WangB Y, ZhangL B, WenL Y, et al. Towards real-world prohibited item detection: a large-scale X-ray benchmark[C] // 2021 IEEE/CVF International Conference on Computer Vision(ICCV). Montreal: IEEE, 2021: 5392-5401.