In single-object tracking, the accuracy of the tracking bounding box is often compromised by factors such as deformation, motion blur, occlusion, and background interference. In particular, background interference frequently leads to tracking hopping and drift. To mitigate these issues, a two-stage tracking algorithm that integrated motion information with a dual-attention mechanism was proposed. In the first stage, a SiamCAR tracker with a dual-attention mechanism was employed to coarsely locate the target in the current frame. In the second stage, a refinement module of the bounding box was constructed using pixel-level similarity computations to learn the subtle features of the target under low-latency conditions, thereby enhancing the tracking accuracy. Finally, the tracking box obtained based on appearance features was fused with the target’s motion trajectory information to mitigate tracking drift and hopping. Experimental results on the OTB100 dataset indicate that the success rate and accuracy of the tracking box have improved by 4.6% and 2.8%, respectively, compared to the original. The success rate in the presence of background interference has reached 69.6%.
WeiYing, XuChu-qiao, DiaoZhao-fu, et al. A multi-target pedestrian tracking algorithm based on generated adversarial network[J]. Journal of Northeastern University (Natural Science), 2020, 41(12): 1673-1679, 1720.
GaoWen, ZhuMing, HeBai-gen, et al. Overview of target tracking technology[J]. Chinese Optics, 2014, 7(3): 365.
[5]
LiB, YanJ J, WuW, et al. High performance visual tracking with Siamese region proposal network[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018: 8971-8980.
ZhengYan, ZhaoJia-xu, BianJie. Improved object tracking algorithm based on SiamBAN tracker[J]. Journal of Northeastern University (Natural Science), 2023, 44(9): 1227-1233.
[8]
GuoD Y, WangJ, CuiY, et al. SiamCAR: Siamese fully convolutional classification and regression for visual tracking[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, 2020: 6268-6276.
[9]
ChenZ D, ZhongB N, LiG R, et al. SiamBAN: target-aware tracking with Siamese box adaptive network[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(4): 5158-5173.
[10]
DongB, ZhugeM, WangY, et al. Accurate camouflaged object detection via mixture convolution and interactive fusion[J]. arXiv Preprint arXiv, 2101: 05687.
[11]
BertinettoL, ValmadreJ, HenriquesJ F, et al. Fully-convolutional Siamese networks for object tracking[C]//Computer Vision-ECCV 2016 Workshops. Cham: Springer International Publishing, 2016: 850-865.
[12]
WangG T, LuoC, XiongZ W, et al. SPM-tracker: series-parallel matching for real-time visual object tracking[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, 2019: 3643-3652.
[13]
FanH, LingH B. Siamese cascaded region proposal networks for real-time visual tracking[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, 2019: 7944-7953.
[14]
YanB, ZhangX Y, WangD, et al. Alpha-refine: boosting tracking performance by precise bounding box estimation[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, 2021: 5289-5298.
[15]
WangZ Q, XuJ, LiuL, et al. RANet: ranking attention network for fast video object segmentation[C]//2019 IEEE/CVF International Conference on Computer Vision(ICCV). Seoul, 2019: 3977-3986.
[16]
LuvizonD C, TabiaH, PicardD. Human pose regression by combining indirect part detection and contextual information[J]. Computers & Graphics, 2019, 85: 15-22.
[17]
LiB, WuW, WangQ, et al. SiamRPN++: evolution of Siamese visual tracking with very deep networks[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, 2019: 4282-4291.