Video person re-identification is a technology for identifying specific person in a multi-camera surveillance network. Compared to the methods based on single-frame images, this type of algorithms can provide more person information, but it also has issues such as model complexity and misalignment in constructing features. To address those issues, a feature fusion-based video person re-identification algorithm was proposed. The proposed algorithm included a global branch and a local branch with spatial transformation. The global branch extracted the global features of person, capturing coarse-grained information and overall contextual information of the person. The local branch with spatial transformation integrated a spatial transformation matrix into the local branch to learn discriminative local regional features and alleviating the issue of feature misalignment. By utilizing a multi-branch structure, the algorithm fused local and global features and aggregated features through temporal average pooling to enhance the diversity of features and improve the robustness of the model. Finally, the model was trained using cross-entropy and a soft boundary triplet loss. The test results on the Mars and DukeMTMC-Video datasets have verified the feasibility of the proposed algorithm. Specifically, the Mars dataset achieves mAP and Rank-1 accuracies of 82.25% and 89.76% respectively, demonstrating excellent practicality.
YangY S, DengM L, LiL, et al. A review of pedestrian re-identification based on deep learning[J]. Computer Engineering and Applications, 2022, 58(9): 51-66.
[3]
ZhouZ, HuangY, WangW,et al.See the forest for the trees:joint spatial and temporal recurrent neural networks for video-based person re-identification[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition.Honolulu:IEEE,2017:6776-6785.
[4]
ZhouK Y, YangY X, CavallaroA,et al.Omni-scale feature learning for person re-identification[C]//2019 IEEE/CVF International Conference on Computer Vision.Seoul:IEEE,2019:3702-3712.
[5]
HeK M, ZhangX Y, RenS Q,et al.Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Re-cognition.Las Vegas:IEEE,2016:770-778.
[6]
LiW, ZhuX T, GongS G.Harmonious attention network for person re-identification[C]//2018 IEEE/ CVF Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:2285-2294.
[7]
SzegedyC, VanhouckeV, IoffeS,et al.Rethinking the inception architecture for computer vision[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:2818-2826.
[8]
LiJ N, ZhangS L, HuangT J.Multi-scale 3D convolution network for video based person re-identification[J].Proceedings of the AAAI Confe-rence on Artificial Intelligence,2019,33(1):8618-8625.
[9]
SuhY, WangJ, TangS, et al. Part-aligned bili-near representations for person re-identification[C]//Proceedings of the European Conference on Computer Vision,Cham:Springer,2018:402-419.
[10]
RistaniE, SoleraF, ZouR S,et al.Performance measures and a data set for multi-target,multi-camera tracking[C]//ECCV 2016 Workshops.Cham:Springer International Publishing,2016:17-35.
[11]
CaoY, XuJ R, LinS,et al.GCNet:non-local networks meet squeeze-excitation networks and beyond[C]//2019 IEEE/CVF International Conference on Computer Vision Workshop.Seoul:IEEE,2019:1971-1980.
[12]
YangM, HeD L, FanM,et al.DOLG:single-stage image retrieval with deep orthogonal fusion of local and global features[C]//2021 IEEE/CVF International Conference on Computer Vision.Montreal:IEEE,2021:11752-11761.
[13]
HouR B, MaB P, ChangH,et al.VRSTC:occlusion-free video person re-identification[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach:IEEE,2019:7183-7192.
[14]
SubramaniamA, NambiarA, MittalA.Co-segmentation inspired attention networks for video-based person re-identification[C]//2019 IEEE/CVF International Conference on Computer Vision.Seoul:IEEE,2019:562-572.
[15]
FuY, WangX Y, WeiY C,et al.STA:spatial-temporal attention for large-scale video-based person re-identification[J].Proceedings of the AAAI Conference on Artificial Intelligence,2019,33(1):8287-8294.
[16]
YangX, LiuL C, WangN N,et al.A two-stream dynamic pyramid representation model for video-based person re-identification[J].IEEE Transactions on Image Processing,2021,30:6266-6276.