Firstly, this paper points out the challenges faced by the design of multi-target tracking algorithms and the limitations of traditional methods. Secondly, a literature review and analysis of two types of algorithms are conducted: detection-based-tracking and joint-detection-tracking. Then, the commonly used evaluation indicators and publicly available datasets in the multi-object tracking algorithms were summarized, and the performance indicators of the two types of methods were analyzed. Finally, based on the current research status, the predictions and outlooks on the problems to be solved and the focuses of the future researches are made.
此外,基于相关滤波的MOT算法也是常用的一类方法。相关滤波源于信号处理领域,相关性用于表示两个信号之间的相似程度,其核心思想是寻找一个滤波模板,让下一帧的图像与滤波模板做卷积操作,响应最大的区域是预测的目标,其本质是求解一个多元二次多项式的回归问题。Bolme等[12]提出的最小化输出平方误差和(Minimum output sum of squared error, MOSSE)首次将相关滤波应用于目标跟踪领域,其核心思想是最小化输出平方误差和训练一个滤波器以实现目标跟踪,但是其得到的是一个线性回归的模型。Henriques等[13]通过引入核函数的方式将分类器变为非线性回归模型,解决了低维线性不可分的问题。MOSSE使用单通道灰度特征对目标进行表示,为了提高模型在复杂场景的鲁棒性,研究者们尝试引入使用不同类别的特征。Danelljan等[14]将颜色属性引入相关滤波器并设计了一种自适应降维方法降低计算复杂度。Zhang等[15]基于朴素贝叶斯框架建立目标物体与其局部上下文之间的时空关系,解决了目标位置模糊的问题。
JinSha-sha, LongWei, HuLing-xi,et al. Research progress of detection and multi-object tra-cking algorithm in intelligent traffic monitoring system[J]. Control and Decision, 2023, 38(4): 890-901.
[3]
CuiY, ZengC, ZhaoX, et al. SportsMOT: a large multi-object tracking dataset in multiple sports scenes[C]∥Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2023: 9921-9931.
[4]
PengJ, WangT, LinW, et al. TPM: multiple object tracking with tracklet-plane matching[J]. Pattern Recognition, 2020, 107: No.107480.
[5]
RenW, WangX, TianJ, et al. Tracking-by-counting: using network flows on crowd density maps for tracking multiple targets[J]. IEEE Transactions on Image Processing, 2020, 30: 1439-1452.
[6]
ShiJ. Good features to track[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, USA, 1994: 593-600.
[7]
BroidaT J, ChellappaR. Estimation of object motion parameters from noisy images[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1986(1): 90-99.
[8]
IsardM, BlakeA. Condensation—conditional density propagation for visual tracking[J]. International Journal of Computer Vision, 1998, 29(1): 5-28.
[9]
NummiaroK, Koller-meierE, Van GoolL. An adaptive color-based particle filter[J]. Image and Vision Computing, 2003, 21(1): 99-110.
YangXin, LiuJia, ZhouPeng-yu, et al. Adaptive particle filter for object tracking based on fusing multiple features[J]. Journal of Jilin University (Engineering and Technology Edition), 2015, 45(2): 533-539.
[12]
ComaniciuD, RameshV, MeerP. Real-time tracking of non-rigid objects using mean shift[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hilton Head, USA, 2000: 142-149.
[13]
JeyakarJ, BabuR V, RamakrishnanK. Robust object tracking with background-weighted local kernels[J]. Computer Vision and Image Understanding, 2008, 112(3): 296-309.
[14]
BolmeD S, BeveridgeJ, DraperB A, et al. Visual object tracking using adaptive correlation filters[C]∥Proceedings of the IEEE Conference on Computer V-ision and Pattern Recognition, San Francisco, USA, 2010: 2544-2550.
[15]
HenriquesJ F, CaseiroR, MartinsP, et al. Exploitin-g the circulant structure of tracking-by-detection with kernels[C]∥European Conference on Computer Vision, Florence, Italy, 2012: 702-715.
[16]
DanelljanM, ShahbazK F, FelsbergM, et al. Adaptive color attributes for real-time visual tracking[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014: 1090-1097.
[17]
ZhangK, ZhangL, LiuQ, et al. Fast visual tracking via dense spatio-temporal context learning[C]∥Euro-pean Conference on Computer Vision, Zürich, Swiss-Confederation, 2014: 127-141.
[18]
KrizhevskyA, SutskeverI, HintonG. Imagenet classification with deep convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2012, 60: 84-90.
[19]
RenS, HeK, GirshickR, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(6): 1137-1149.
[20]
LiuW, AnguelovD, ErhanD, et al. SSD: single shot multibox detector[C]∥European Conference on Computer Vision, Amsterdam, Netherland, 2016: 21-37.
[21]
BewleyA, GeZ, OttL, et al. Simple online and realtime tracking[C]∥IEEE International Conference on Image Processing (ICIP). Phoenix, USA, 2016: 3464-3468.
[22]
HeK, GkioxariG, DollárP, et al. Mask R-CNN[C]∥Proceedings of the IEEE/CVF International Conference on Computer Vision, Venice, Italy, 2017: 2961-2969.
[23]
ZhouZ, XingJ, ZhangM, et al. Online multi-target tracking with tensor-based high-order graph matchin-g[C]∥24th International Conference on Pattern Recognition (ICPR), Bejing, China, 2018: 1809-1814.
[24]
ZhaoD, FuH, XiaoL, et al. Multi-object tracking with correlation filter for autonomous vehicle[J]. Sensors, 2018, 18(7): 2004.
[25]
ZhangY, SunP, JiangY, et al. Bytetrack: multi-object tracking by associating every detection box[C]∥European Conference on Computer Vision, Tel Aviv, The State of Israel, 2022: 1-21.
[26]
GeZ, LiuS, WangF, et al. YOLOX: exceeding yo-lo series in 2021[DB/OL]. [2021-08-06].
[27]
SunS J, AkhtarN, SongH S, et al. Deep affinity network for multiple object tracking[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 43(1): 104-119.
[28]
YuF, LiW, LiQ, et al. POI: multiple object tracking with high performance detection and appearance feature[C]∥Computer Vision-ECCV 2016 Workshops, Amsterdam, The Netherlands, 2016: 36-42.
[29]
HuangK, SunB, ChenF, et al. Reidtrack: multi-object track and segmentation without motion[DB/OL]. [2023-08-03].
[30]
KimC, FuX L, AlotaibiM, et al. Discriminative appearance modeling with multi-track pooling for real-time multi-object tracking[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 2021: 9553-9562.
[31]
CaoJ, PangJ, WengX, et al. Observation-centric sort: rethinking sort for robust multi-object tracking[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 9686-9696.
[32]
HanS, HuangP, WangH, et al. MAT: motion-aware multi-object tracking[J]. Neurocomputing, 2022, 476: 75-86.
[33]
QinZ, ZhouS, WangL, et al. Motiontrack: learning robust short-term and long-term mo-tions for multi-object tracking[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 17939-17948.
[34]
LarsenM, RolfsjordS, GuslandD, et al. Base: probably a better approach to multi-object track-ing[DB/OL]. [2023-09-21].
[35]
WojkeN, BewleyA, PaulusD. Simple online and realtime tracking with a deep association metric[C]∥2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 2017: 3645-3649.
[36]
KarunasekeraH, WangH, ZhangH. Multiple object tracking with attention to appearance, structure, motion and size[J]. IEEE Access, 2019, 7: 104423-104434.
[37]
SeidenschwarzJ, BrasóG, SerranoV, et al. Simple cues lead to a strong multi-object tracker[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 13813-13823.
[38]
YangM, HanG, YanB, et al. Hybrid-sort: weak cues matter for online multi-object tracking[C]∥Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2024: 6504-6512.
[39]
LiJ, GaoX, JiangT. Graph networks for multiple object tracking[C]∥Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, USA, 2020: 719-728.
[40]
BrasóG, Leal-taixéL. Learning a neural solver for multiple object tracking[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 2020: 6247-6257.
[41]
LiuQ, ChuQ, LiuB, et al. GSM: graph similarity model for multi-object tracking[C]∥Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, Yokohama, Japan, 2020: 530-536.
[42]
CetintasO, BrasóG, Leal-taixéL. Unifying short and long-term tracking with graph hierarchies[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 22877-22887.
[43]
VaswaniA, ShazeerN, ParmarN, et al. Attention is all you need[C]∥Advances in Neural Information Processing Systems, Long Beach, USA, 2017: 5999-6009.
[44]
SunP, CaoJ, JiangY, et al. Transtrack: multiple o-bject tracking with transformer[DB/OL]. [2021-05-04].
[45]
MeinhardtT, KirillovA, Leal-taixeL, et al. Trackformer: multi-object tracking with transformers[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 8844-8854.
[46]
CarionN, MassaF, SynnaeveG, et al. End-to-end object detection with transformers[C]∥European Conference on Computer Vision, Virtual, 2020: 213-229.
[47]
XuY, BanY, DelormeG, et al. TransCenter: transformers with dense representations for multiple-object tracking[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(6): 7820-7835.
[48]
ChuP, WangJ, YouQ, et al. Transmot: spatial-temporal graph transformer for multiple object tracking[C]∥Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, USA, 2023: 4870-4880.
[49]
ZengF, DongB, ZhangY, et al. MOTR: end-to-end multiple-object tracking with transformer[C]∥European Conference on Computer Vision, Tel Aviv, The State of Israel, 2022: 659-675.
[50]
ZhangY, WangT, ZhangX. MOTRv2: bootstrapping end-to-end multi-object tracking by pretrained object detectors[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 22056-22065.
[51]
GaoR, WangL. MeMOTR: long-term memory-augmented transformer for multi-object tracking[C]∥Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2023: 9901-9910.
[52]
BertinettoL, ValmadreJ, HenriquesJ, et al. Fully-convolutional siamese networks for object tracking[C]∥European Conference on Computer Vision, Amsterdam, Netherland, 2016: 850-865.
[53]
XuY, OsepA, BanY, et al. How to train your deep multi-object tracker[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 2020: 6787-6796.
[54]
BergmannP, MeinhardtT, Leal-TaixeL. Tracking without bells and whistles[C]∥Proceedings of the IEEE/CVF International Conference on Computer Vision, Long Beach, USA, 2019: 941-951.
[55]
PangJ, QiuL, LiX, et al. Quasi-dense similarity learning for multiple object tracking[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 2021: 164-173.
[56]
GaoX, ShenZ, YangY. Multi-object tracking with siamese-RPN and adaptive matching strategy[J]. Signal, Image and Video Processing, 2022, 16(4): 965-973.
[57]
ShuaiB, BerneshawiA, LiX, et al. SiamMOT: siamese multi-object tracking[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 2021: 12372-12382.
[58]
ZhouX, KoltunV, KrähenbühlP. Tracking objects as points[C]∥European Conference on Computer Vision, Virtual, 2020: 474-490.
[59]
WangZ, ZhengL, LiuY, et al. Towards real-time multi-object tracking[C]∥European Conference on Computer Vision, Virtual, 2020: 107-122.
[60]
LuZ, RathodV, VotelR, et al. Retinatrack: online single stage joint detection and tracking[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 2020: 14668-14678.
[61]
LinT, GoyalP, GirshickR, et al. Focal loss for dense object detection[C]∥Proceedings of the IEEE/CVF International Conference on Computer Vision, Venice, Italy, 2017: 2980-2988.
Quyou, LiWen-hui. Multiple object tracking method based on multi-task joint learning[J]. Journal of Jilin University (Engineering and Technology Edition), 2023, 53(10): 2932-2941.
[64]
LiangC, ZhangZ, ZhouX, et al. Rethinking the competition between detection and reid in multiobject tracking[J]. IEEE Transactions on Image Processing, 2022, 31: 3182-3196.
[65]
ZhangY, WangC, WangX, et al. FairMOT: on the fairness of detection and re-identification in multiple object tracking[J]. International Journal of Computer Vision, 2021, 129: 3069-3087.
[66]
DuanK, BaiS, XieL, et al. Centernet: keypoint triplets for object detection[C]∥Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, South Korea, 2019: 6569-6578.
[67]
LiangT, LiB, WangM, et al. A closer look at the joint training of object detection and re-identification in multi-object tracking[J]. IEEE Transactions on Image Processing, 2022, 32: 267-280.
[68]
BernardinK, StiefelhagenR. Evaluating multiple object tracking performance: the clear mot metrics[J]. EURASIP Journal on Image and Video Processing, 2008, 2008: 1-10.
[69]
RistaniE, SoleraF, ZouR, et al. Performance measures and a data set for multi-target, multi-camera tracking[C]∥European Conference on Computer Vision, Amsterdam, Netherland, 2016: 17-35.
[70]
LuitenJ, OsepA, DendorferP, et al. HOTA: a higher order metric for evaluating multi-object tracking[J]. International Journal of Computer Vision, 2021, 129: 548-578.
[71]
GeigerA, LenzP, UrtasunR. Are we ready for autonomous driving? the kitti vision benchmark suite[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Providence, USA, 2012: 3354-3361.
[72]
Leal-taixéL, MilanA, ReidI, et al. MOTchallenge 2015: Towards a benchmark for multi-target tracking[DB/OL]. [2015-04-08].
[73]
MilanA, Leal-taixéL, ReidI, et al. MOT16: a benchmark for multi-object tracking[DB/OL]. [2016-05-03].
[74]
DendorferP, RezatofighiH, MilanA, et al. MOT20: a benchmark for multi object tracking in cro-wded scenes[DB/OL]. [2020-03-19].
[75]
SunP, CaoJ, JiangY, et al. Dancetrack: multi-object tracking in uniform appearance and diverse motion[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 20993-21002.