Addressing the limitations of existing object counting algorithms, which struggle with background clutter and exhibit low accuracy when dealing with heavily occluded or significantly varying object scales, we propose a novel lightweight multi-class object counting network based on feature pyramid with local information encoding (FPLE-MOCN). This model leverages the redundancy of feature maps in convolutional neural networks to construct an efficient and rapid lightweight backbone network. Additionally, a feature pyramid module with a local information encoding mechanism is introduced to capture the local features of targets. Finally, regression and classification heads composed of convolutional layers are employed for predicting the number and the location of objects at the same time. To achieve multi-class object counting, we combine the training sets of the existing crowd counting dataset (ShanghaiTech) and the vehicle counting dataset (CARPK) for training. For comparison with existing methods, we evaluate our model on the test sets of both datasets separately and use both mean absolute error and mean squared error as evaluation metrics for counting. Experimental results demonstrate that FPLE-MOCN can perform multi-class object counting and outperforms other methods in terms of counting accuracy.
为了评估所提算法在人群计数任务上的有效性,在ShanghaiTech数据集上与目前主流的人群计数算法进行对比,包括multi-column convolutional neural network (MCNN)[5],congested scene recognition network (CSRNet)[18],point to point network (P2PNet) [8],weakly-supervised crowd counting with transformers (TransCrowd)[19],distribution matching for crowd counting (DM-Count)[20]和segmentation guided attention network (SGANet)[21]。人群计数的对比实验结果如表1所示。
由表1可知:在ShanghaiTech PartA数据集上,本研究所提出的算法取得最优的MAE和MSE,比次优方法P2PNet分别提升了1.06和0.06;在ShanghaiTech PartB数据集上,本研究算法取得了最好的MAE指标,比次优方法P2PNet提高0.02,并在MSE指标上取得了次优表现。综上,本研究所提出的算法在ShanghaiTech数据集的四项对比中取得三个最优表现,一个次优表现。实验结果验证了本研究所提方法在人群计数任务上的先进性和有效性。图3展示了在ShanghaiTech Part B数据集上某张图像的计数情况,其中图3(a)是原图,图3(b)是真实密度图和真实计数结果,图3(c)是MCNN的预测密度图和计数结果,图3(d)是CSRNet预测密度图及计数结果,图3(f)是SGANet预测密度图及计数结果,图3(e)是本文算法的预测及计数结果(红色点为预测点)。由图3可见,本研究算法的预测结果很接近真值。
MANJUD, RADHAV. A Survey on Human Activity Prediction Techniques[J]. Int J Adv Technol Eng Explor, 2018, 5(47): 400-406. DOI: 10.19101/ijatee.2018.547006 .
[2]
KARPAGAVALLIP, RAMPRASADA V. Estimating the Density of the People and Counting the Number of People in a Crowd Environment for Human Safety[C]//2013 International Conference on Communication and Signal Processing. New York: IEEE, 2013: 663-667. DOI: 10.1109/iccsp.2013.6577138 .
[3]
WANJ, KUMARN S, CHANA B. Fine-grained Crowd Counting[J]. IEEE Trans Image Process, 2021, 30: 2114-2126. DOI: 10.1109/TIP.2021.3049938 .
[4]
WANGQ, GAOJ Y, LINW, et al. NWPU-crowd: A Large-scale Benchmark for Crowd Counting and Localization[J]. IEEE Trans Pattern Anal Mach Intell, 2021, 43(6): 2141-2149. DOI: 10.1109/TPAMI.2020.3013269 .
[5]
ZHANGY Y, ZHOUD S, CHENS Q, et al. Single-image Crowd Counting via Multi-column Convolutional Neural Network[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE, 2016: 589-597. DOI: 10.1109/CVPR.2016.70 .
[6]
ZHANGS H, WUG H, COSTEIRAJ P, et al. FCN-RLSTM: Deep Spatio-temporal Neural Networks for Vehicle Counting in City Cameras[C]//2017 IEEE International Conference on Computer Vision (ICCV). New York: IEEE, 2017: 3687-3696. DOI: 10.1109/ICCV.2017.396 .
[7]
LEIBEB, SEEMANNE, SCHIELEB. Pedestrian Detection in Crowded Scenes[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). New York: IEEE, 2005: 878-885. DOI: 10.1109/CVPR.2005.272 .
[8]
SONGQ Y, WANGC G, JIANGZ K, et al. Rethinking Counting and Localization in Crowds: A Purely Point-based Framework[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). New York: IEEE, 2021: 3345-3354. DOI: 10.1109/ICCV48922.2021.00335 .
[9]
LIM, ZHANGZ X, HUANGK Q, et al. Estimating the Number of People in Crowded Scenes by MID Based Foreground Segmentation and Head-shoulder Detection[C]//2008 19th International Conference on Pattern Recognition. New York: IEEE, 2008: 1-4. DOI: 10.1109/ICPR.2008.4761705 .
[10]
DENGJ, DONGW, SOCHERR, et al. ImageNet: A Large-scale Hierarchical Image Database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2009: 248-255. DOI: 10.1109/CVPR.2009.5206848 .
[11]
HSIEHM R, LINY L, HSUW H. Drone-based Object Counting by Spatially Regularized Regional Proposal Network[C]//2017 IEEE International Conference on Computer Vision (ICCV). New York: IEEE, 2017: 4165-4173. DOI: 10.1109/ICCV.2017.446 .
[12]
CHENJ R, KAOS H, HEH, et al. Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE, 2023: 12021-12031. DOI: 10.1109/CVPR52729.2023.01157 .
[13]
NAIRV, HINTONG E. Rectified Linear Units Improve Restricted Boltzmann Machines[C]//Proceedings of the 27th international conference on machine learning (ICML-10). Madison: Omnipress, 2010: 807-814.
[14]
LVS L, LIANGJ Z, DIL, et al. A Probabilistic Collaborative Dictionary Learning-based Approach for Face Recognition[J]. IET Image Process, 2021, 15(4): 868-884. DOI: 10.1049/ipr2.12068 .
[15]
CHENS, LAIX, YANY, et al. Learning an Attention-Aware Parallel Sharing Network for Facial Attribute Recognition [J]. J Vis Commun Image Represent, 2023, 90: 103745. DOI: 10.1016/j.jvcir.2022.103745 .
[16]
LINT Y, GOYALP, GIRSHICKR, et al. Focal Loss for Dense Object Detection[C]//2017 IEEE International Conference on Computer Vision (ICCV). New York: IEEE, 2017: 2999-3007. DOI: 10.1109/ICCV.2017.324 .
[17]
KINGMAD P, BAJ. Adam: A Method for Stochastic Optimization[C]//2014 International Conference on Learning Representations. New York:Curran Associates. DOI: 10.48550/arXiv.1412.6980 .
[18]
LIY H, ZHANGX F, CHEND M. CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2018: 1091-1100. DOI: 10.1109/CVPR.2018.00120 .
[19]
LIANGD K, CHENX W, XUW, et al. TransCrowd: Weakly-supervised Crowd Counting with Transformers[J]. Sci China Inf Sci, 2022, 65(6): 160104. DOI: 10.1007/s11432-021-3445-y .
[20]
WANGB, LIUH, SAMARASD, et al. Distribution Matching for Crowd Counting[C]//Advances in Neural Information Processing Systems. Red Hook: Curran Associates, Inc., 2020, 33: 1595-1607. DOI: https://doi.org/10.48550/arXiv.2009.13077 .
[21]
WANGQ, BRECKONT P. Crowd Counting via Segmentation Guided Attention Networks and Curriculum Loss[J]. IEEE Trans Intell Transp Syst, 2022, 23(9): 15233-15243. DOI: 10.1109/TITS.2021.3138896 .
[22]
SHIX W, LIX, WUC L, et al. A Real-time Deep Network for Crowd Counting[C]//ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). New York: IEEE, 2020: 2328-2332. DOI: 10.1109/ICASSP40776.2020.9053780 .
[23]
RENS Q, HEK M, GIRSHICKR, et al. Faster R-CNN: Towards Real-time Object Detection with Region Proposal Networks[J]. IEEE Trans Pattern Anal Mach Intell, 2017, 39(6): 1137-1149. DOI: 10.1109/TPAMI.2016.2577031 .