3.College of Mechanical and Electrical Engineering,Northeast Forestry University,Harbin 150040,China
4.Feline Research Center of National Forestry and Grassland Administration,College of Wildlife and Protected Area,Northeast Forestry University,Harbin 150040,China
Behavioral research on Amur tiger (Panthera tigris altaica) cubs is critical to conservation biology and developmental ecology. Traditional manual observation is inefficient and susceptible to observer bias, underscoring the need for automated and objective methods. This study proposes and validates a deep learning framework based on skeletal keypoints for precise behavior recognition and tracking of Amur tiger cubs. Using surveillance videos from 15 cubs at the Heilongjiang Siberian Tiger Park and the Hengdaohezi Siberian Tiger Park, we constructed a dataset comprising a pose-estimation set with 16 manually annotated keypoints and a behavior-recognition set with five common behavior categories. A trained high-resolution network (HRNet) generated initial pose estimates; sequential keypoint series were then linked to behavior labels and individual IDs to construct a behavior recognition and tracking dataset. We benchmarked multiple behavior-recognition networks and applied ByteTrack for multi-object tracking. Results show that the attention-enhanced adaptive graph convolutional neural network (AAGCN) achieved the best behavior recognition accuracy at 76.59%, while ByteTrack reached a multiple object tracking accuracy (MOTA) of 92.76% for individual tracking. The proposed approach performs strongly for behavior recognition and tracking of captive large felid cubs, providing a reliable tool for quantitative behavioral analysis with direct applications to wildlife conservation and breeding management.
近年来,随着计算机视觉与人工智能技术的迅速发展,深度学习在动物行为识别领域取得了巨大进展[5-7]。Deng et al.[8]利用循环神经网络和自注意力机制分支,对骨骼序列的时空信息进行提取,实现了对野生动物飞奔、坐、行走和站立4种行为的识别,准确率达96.8%。Feng et al.[9]则通过循环神经网络与轻量化卷积神经网络组成双分支结构,提取骨骼序列的时空信息,实现了对野生动物站立、漫步和奔跑3种行为的识别,准确率达95.00%。Lin et al.[10]利用卷积神经网络提取姿态特征,在自建鸟类行为数据集IMLab-P8-2021上实现了鸟类多种常见行为的识别,其关键点正确百分比(percentage of correct keypoints,PCK)和行为识别的整体准确率(overall accuracy,OA)最高达87.99%和87.81%。Li et al.[11]采用时空图卷积网络(ST-GCN)充分提取姿态序列中的时空信息,实现了对奶牛跛行的预测,准确率达97.20%。这些成果表明,深度学习技术在动物行为识别中具有良好的识别性能,其准确性已超越传统学习算法和人工水平。
基于深度学习的方法目前是动物行为识别任务中的重要方法,其中基于骨骼关键点的行为识别主要是基于动物姿态序列进行行为识别和视频理解,与其他模式(如彩色图像/光流)相比,姿态数据紧凑且信息丰富[12],含有多种互补特征信息[13],由于姿态序列仅捕捉动作信息,对背景与光照变化不敏感[14],且不受目标大小或形态变化的影响,能够有效对小物体或密集目标进行行为识别,具有更强的鲁棒性。同时,骨骼关键点的提取计算复杂度较低,适用于计算资源有限的环境,特别适合实时监控和大规模数据处理场景。此外,该方法以行为三要素(姿势-行为-环境)中的“姿势”为切入点,不仅能准确表征个体的运动模式,还可为行为分析提供辅助信息。尽管基于骨骼关键点的深度学习方法在动物行为识别领域展现出显著优势,但在东北虎幼虎群体行为识别的特定场景下,仍缺乏专用数据集和针对性的实验研究。虽然Deng et al.[8]与Feng et al.[9]的研究以姿态为基础,但其模型在时空特征融合方面仍较为基础,未能充分挖掘关节间的结构关系与复杂的时空关联,也未能系统融合如关节运动、骨骼长度变化等多维特征,因而在处理行为多变、遮挡严重的幼虎场景时存在局限。这些不足不仅限制了该方法在该场景下的直接应用,也为行为连续性建模及群体交互分析带来了挑战。
为准确实现东北虎幼虎的多目标行为识别及个体行为统计分析,采用能有效保留并关联低置信度检测框的ByteTrack算法,将其应用于幼虎群体的多目标跟踪中。为进一步评估不同跟踪模型的性能,将ByteTrack与另外两种主流跟踪算法,即基于深度关联度量的跟踪算法(SORT with deep association metric,DeepSORT)[23]和以观测为中心的跟踪算法(observation-centric SORT,OCSORT)[24]进行对比,旨在选择性能最佳的模型,以支持东北虎幼虎的行为识别和统计分析。
Li et al.[5]使用计算机视觉技术可自动识别狮(Panthera leo)和狼(Canis lupus)等野生动物的行为,但其方法基于彩色图像,易受环境和光照变化干扰,且忽略了能提供多种互补特征的姿态信息。相比之下,本研究采用姿态数据进行行为识别,数据更加紧凑且信息丰富,对背景和光照变化具有更好的鲁棒性。Deng et al.[8]和Feng et al.[9]虽基于姿态进行了野生动物行为识别,但未能充分利用姿态序列中的多维度特征信息,且主要针对成年个体,未针对幼崽目标小、行为特征不明显的问题进行优化。本研究通过融合姿态的5种特征信息,充分挖掘各特征之间的互补性,并在东北虎幼虎行为识别任务中验证了其有效性。
List of national key protected wild animals in China (revised on February1, 2021) [J]. Chinese Journal of Wildlife, 2021, 42(2): 605-640.
[3]
GOODRICHJ, WIBISONOH, MIQUELLED, et al. Panthera tigris[J/OL]. The IUCN Red List of Threatened Species, 2022: e.T15955A214862019[2025-01-14].
[4]
ALIBHAIS K, GUJ Y, JEWELLZ C, et al. ‘I know the tiger by his paw’: A non-invasive footprint identification technique for monitoring individual Amur tigers (Panthera tigris altaica) in snow [J]. Ecological Informatics, 2023, 73: 101947.
[5]
MATHISA, MAMIDANNAP, CURYK M, et al. DeepLabCut: Markerless pose estimation of user-defined body parts with deep learning [J]. Nature Neuroscience, 2018, 21(9): 1281-1289.
[6]
LIW N, SWETHAS, SHAHM. Wildlife action recognition using deep learning [EB/OL]. TechRxiv (2025-10-27)[2025-10-30].
[7]
SWARUPP, CHENP, HOUR, et al. Giant panda behaviour recognition using images [J]. Global Ecology and Conservation, 2021, 26: e01510.
MAG K, ZHANGJ, DAIW R, et al. Body stripes individual identification of Amur tigers based on transformer [J]. Chinese Journal of Wildlife, 2024, 45(4): 734-743.
[10]
DENGS C, TANGG Z, MEIL. Wild mammal behavior recognition based on gated transformer network [C]//2022 International Conference on Cyber-Physical Social Intelligence (ICCSI), November 18-21, 2022. Nanjing: IEEE, 2022: 739-743.
[11]
FENGL Q, ZHAOY Q, SUNY C, et al. Action recognition using a spatial-temporal network for wild felines [J]. Animals, 2021, 11(2): 485.
[12]
LINC W, HONGS D, LINM X, et al. Bird posture recognition based on target keypoints estimation in dual-task convolutional neural networks [J]. Ecological Indicators, 2022, 135: 108506.
[13]
LIZ Y, ZHANGQ R, LVS C, et al. Fusion of RGB, optical flow and skeleton features for the detection of lameness in dairy cows [J]. Biosystems Engineering, 2022, 218: 62-77.
[14]
DUANH D, WANGJ Q, CHENK, et al. PYSKL: towards good practices for skeleton action recognition [C]//Proceedings of the 30th ACM International Conference on Multimedia, October 10-14, 2022. Lisboa: ACM, 2022: 7351-7354.
[15]
YANS J, XIONGY J, LIND H. Spatial temporal graph convolutional networks for skeleton-based action recognition [EB/OL]. arXiv(2018-01-25)[2025-01-07].
[16]
DUANHD, ZHAOY, CHENK, et al. Revisiting skeleton-based action recognition [C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 18-24, 2022. New Orleans: IEEE, 2022: 2969-2978.
[17]
ZHANGH Y, WANGY, DAYOUBF, et al. VarifocalNet: An IoU-aware dense object detector [C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 20-25, 2021. Nashville: IEEE, 2021: 8514-8523.
[18]
SUNK, XIAOB, LIUD, et al. Deep high-resolution representation learning for human pose estimation [C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 15-20, 2019. Long Beach: IEEE, 2020: 5693-5703.
[19]
ZHANGY F, SUNP Z, JIANGY, et al. ByteTrack: multi-object tracking by associating every detection box [C]//AVIDAN S,BROSTOW G,CISSÉ M,et al. Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, lsrael, October 23-27,2022 , proceedings, part ⅩⅫ. Cham: Springer,2022: 1-21.
QIAOZ L, WEIQ G. Construction of descriptive behavioral ethogram of Amur tiger [J]. Heilongjiang Animal Science and Veterinary Medicine, 2015(9): 207-209.
[22]
HEK M, ZHANGX Y, RENS Q, et al. Deep residual learning for image recognition [C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016. Las Vegas: IEEE, 2016: 770-778.
[23]
SHIL, ZHANGY F, CHENGJ, et al. Two-stream adaptive graph convolutional networks for skeleton-based action recognition [C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 15-20, 2019. Long Beach: IEEE, 2020: 12018-12027.
[24]
SHIL, ZHANGY F, CHENGJ, et al. Skeleton-based action recognition with multi-stream adaptive graph convolutional networks [J]. IEEE Transactions on Image Processing, 2020, 29: 9532-9545.
[25]
CHENY X, ZHANGZ Q, YUANC F, et al. Channel-wise topology refinement graph convolution for skeleton-based action recognition [C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV), October 10-17, 2021. Montreal: IEEE, 2022: 13359-13368.
[26]
WOJKEN, BEWLEYA, PAULUSD. Simple online and realtime tracking with a deep association metric [C]//2017 IEEE International Conference on Image Processing (ICIP), September 17-20, 2017. Beijing: IEEE, 2018: 3645-3649.
[27]
CAOJ K, PANGJ M, WENGX S, et al. Observation-centric SORT: Rethinking SORT for robust multi-object tracking [C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 17-24, 2023. Vancouver: IEEE, 2023: 9686-9696.
[28]
KINGMAD P, BAJ. Adam: A method for stochastic optimization [EB/OL]. arXiv(2017-01-30) [2025-01-14].
[29]
BOTTOUL. Large-scale machine learning with stochastic gradient descent [C]//LECHEVALLIER Y, SAPORTA G. Proceedings of COMPSTAT'2010: 19th International Conference on Computational Statistics, Paris France, August 22-27, 2010. Heidelberg: Springer, 2010: 177-186.
[30]
LIY J, YANGS, LIUP D, et al. SimCC: A simple coordinate classification perspective for human pose estimation [C]//AVIDAN S, BROSTOW G, CISSÉ M, et al. Computer Vision-ECCV 2022:17th European Conference, Tel Aviv, Israel, October 23-27, 2022 , proceedings, part Ⅵ. Cham: Springer, 2022: 89-106.
[31]
ZHANGF, ZHUX T, DAIH B, et al. Distribution-aware coordinate representation for human pose estimation [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 13-19, 2020. Seattle, IEEE, 2020: 7093-7102.
[32]
XIAOB, WUH P, WEIY C. Simple baselines for human pose estimation and tracking [C]//Computer Vision-ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018 , proceedings, part Ⅵ. Cham: Springer, 2018: 472-487.
[33]
XUY F, ZHANGJ, ZHANGQ M, et al. ViTPose: Simple vision transformer baselines for human pose estimation [EB/OL]. arXiv(2022-04-26)[2025-01-14].
[34]
NASIRIA, YODERJ, ZHAOY, et al. Pose estimation-based lameness recognition in broiler using CNN-LSTM network [J]. Computers and Electronics in Agriculture, 2022, 197: 106931.