1.School of Electronics and Information Engineering,Changchun University of Science and Technology,Changchun 130022,China
2.National and Local Joint Engineering Research Center of Space Photoelectric Technology,Changchun University of Science and Technology,Changchun 130022,China
3.Changchun Shikai Technology Industry Co. ,Research and Development Center,Changchun 130015,China
Aiming at the problems of target misdetection and omission, inaccurate association, and re-identification error in multi-target tracking in dense pedestrian scenarios, this study proposes a multi-pedestrian tracking network based on Transformer. The algorithm consists of three modules: detection, data association and tracking, in which the detection module adopts the selective query recollection method to enhance the decoder's collection of key features, improve the model's ability to characterize the target, and effectively reduce the problem of target misdetection and omission; the data association module adopts the fusion strategy of bilinear LSTM and quadratic data association, to solve the inaccurate association of dense pedestrians due to the similarity of the appearance of the target; Finally, the attention pyramid is embedded into the pyramid spatio-temporal aggregation module on the tracking module to capture the spatio-temporal information of the feature map at different scales, which improves the accuracy of target re-identification.The performance of the proposed network is tested on the publicly available datasets MOT16, MOT17, and the experimental results show that the method in this study is able to achieve more accurate multi-pedestrian tracking compared to other methods.
针对上述问题,Wojke等[3]提出了Simple online and realtime tracking(SORT),使用匈牙利算法在图像空间和逐帧数据关联中执行卡尔曼滤波,提高数据关联的准确性。Zhang等[4]则联合检测与重识别,提出了FairMOT,通过消除检测分支的不公平,有效学习了ReID特征。注意力机制的提出,使得基于Transformer框架的多目标跟踪方法得到了深入研究。Xu等[5]提出了TransCenter方法,通过图像相关的密集检测查询和稀疏跟踪查询解决了MOT问题。此外,GTR算法[6]为所有对象生成全局轨迹并与对象检测器联合训练。Cai等[7]提出了MeMOT,通过时空内存存储被跟踪对象的ID嵌入实现MOT。这类方法将Transformer的自注意力机制及多头自注意力层广泛地应用于特征提取网络,能够有效缓解严重遮挡情况下漏检和错检的问题。
DingGui-peng, TaoGang, PangChun-qiao, et al. Anchorless target tracking algorithm for lightweight siamese network[J]. Journal of Jilin University (Science Edition),2023,61(4):890-898.
XuTao, MaKe, LiuCai-hua, et al. Multi-object pedestrian tracking based on deep learning[J]. Journal of Jilin University (Engineering and Technology Edition), 2021, 51(1): 27-38.
[5]
WojkeN, BewleyA, PaulusD. Simple online and realtime tracking with a deep association metric[C]∥IEEE International Conference on Image Processing (ICIP), Beijing, China, 2017: 3645-3649.
[6]
ZhangY, WangC, WangX, et al. Fairmot: on the fairness of detection and re-identification in multiple object tracking[J]. International Journal of Computer Vision, 2021, 129: 3069-3087.
[7]
XuY, BanY, DelormeG, et al. TransCenter: Transformers with dense representations for multiple-object tracking[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(6): 7820-7835.
[8]
ZhouX, YinT, KoltunV, et al. Global tracking transformers[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,New Orleans, USA, 2022: 8771-8780.
[9]
CaiJ, XuM, LiW, et al. Memot: multi-object tracking with memory[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 8090-8100.
[10]
ChenF, ZhangH, HuK, et al. Enhanced training of query-based object detection via selective query recollection[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,Vancouver, Canada, 2023: 23756-23765.
[11]
WangY, ZhangP, GaoS, et al. Pyramid spatial-temporal aggregation for video-based person re-identification[C]∥Proceedings of the IEEE/CVF International Conference on Computer Vision,Montreal, Canada, 2021: 12026-12035.
[12]
ChenG, GuT, LuJ, et al. Person re-identification via attention pyramid[J]. IEEE Transactions on Image Processing, 2021, 30: 7663-7676.
[13]
KimC, LiF X, AlotaibiM, et al. Discriminative appearance modeling with multi-track pooling for real-time multi-object tracking[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,Nashville, USA, 2021: 9553-9562.
[14]
ZhangY, SunP, JiangY, et al. Bytetrack: multi-object tracking by associating every detection box[C]∥The 17th European Conference on Computer Vision,Tel Aviv, Israel, 2022: 1-21.
[15]
MeinhardtT, KirillovA, Leal-TaixeL, et al. Trackformer: multi-object tracking with transformers[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,New Orleans,USA, 2022: 8844-8854.
[16]
ZengF, DongB, ZhangY, et al. Motr: end-to-end multiple-object tracking with transformer[C]∥The 17th European Conference on Computer Vision,Tel Aviv, Israel, 2022: 659-675.
[17]
SunP, CaoJ, JiangY, et al. Transtrack: multiple object tracking with transformer[J/OL].[2023-11-20].
[18]
ZhuX, SuW, LuL, et al. Deformable detr: deformable transformers for end-to-end object detection[J/OL]. [2023-11-21].
ZhuangShan-na, WangJun-shuai, BaiJing, et al.Video-based person re-identification based on three-dimensional convolution and self-attention mechanism[J]. Journal of Jilin University (Engineering and Technology Edition), 2025, 55(7): 2409-2417.
TuShu-qin, HuangZheng-xin, LiangYun, et al.Improvement of the TransTrack multi-objective hog behavior tracking method[J]. Transactions of the Chinese Society of Agricultural Engineering, 2023,39(15): 172-180.
[23]
GuoY, StutzD, SchieleB. Robustifying token attention for vision transformers[C]∥Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2023: 17511-17522.
[24]
PengJ, WangC, WanF, et al. Chained-tracker: Chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking[C]∥Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, 2020: 145-161.
[25]
WangY, KitaniK, WengX. Joint object detection and multi-object tracking with graph neural networks[C]∥IEEE International Conference on Robotics and Automation(ICRA), Xi'an, China, 2021: 13708-13715.
[26]
PangB, LiY, ZhangY, et al. Tubetk: adopting tubes to track multi-object in a one-step training model[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,Seattle, USA, 2020: 6307-6317.
[27]
YuF, LiW, LiQ, et al. Poi: multiple object tracking with high performance detection and appearance feature[C]∥European Conferenceon Computer Vision: amsterdam, The Netherlands, 2016: 36-42.
WanX, ZhouS, WangJ, et al. Multiple object tracking by trajectory map regression with temporal priors embedding[C]∥Proceedings of the 29th ACM International Conference on Multimedia, Chengdu, China, 2021: 1377-1386.
[30]
WuJ, CaoJ, SongL, et al. Track to detect and segment: an online multi-object tracker[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville,USA,2021: 12347-12356.
[31]
NguyenP, QuachK G, KitaniK, et al. Type-to-track: retrieve any object via prompt-based tracking[J/OL].[2023-11-20].
[32]
MahmoudiN, AhadiS M, RahmatiM. Multi-target tracking using CNN-based features: CNNMTT[J]. Multimedia Tools and Applications, 2019, 78(6): 7077-7096.
[33]
MenesesM, MatosL, PradoB, et al. Learning to associate detections for real-time multiple object tracking[J/OL]. [2023-11-22].
[34]
PangJ, QiuL, LiX, et al. Quasi-dense similarity learning for multiple object tracking[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,Nashville, USA, 2021: 164-173.
[35]
ZhouX, KoltunV, KrähenbühlP. Tracking objects as points[C]∥European Conference on Computer Vision, Glasgow, UK, 2020: 474-490.