Human action recognition is a key technology for understanding pedestrian intentions from video captured by unmanned aerial vehicles (UAV).However,UAV platforms have limited computing power,and existing action recognition methods are inefficient.A lightweight spatial grouping attention graph convolutional network (SGA-GCN) was proposed to reduce network depth to improve the efficiency and ensure the accuracy of action recognition.In order to capture body parts that represent global motion,spatial grouping attention was introduced to enhance local features with high similarity to global features.Moreover,since it was impossible to effectively distinguish actions with similar motion trajectories solely based on joint and skeletal features,a high-order feature encoding of skeletal angles was constructed to capture changes in angles between limb joints that better reflected subtle motion differences and improved feature representation capabilities.Finally,to address the low frame rate issue in UAV aerial video,a linear interpolation scheme based on inter-frame differences was proposed to increase sample information quantity.Experimental results demonstrate that compared to the existing state-of-the-art (SOTA) methods,the proposed approach achieves better performance in terms of recognition rate,parameter quantity,training time and execution time on the UAV-Human dataset.
虽然现有的深度学习模型在无人机HAR上已经取得了一些成果,然而,随着模型研究的不断加深和扩展,网络的复杂度也随之增加,导致模型参数量的爆发式增长,因此训练与执行时间极大增加,从而无法有效地应用到实际场景中。在不损失性能的前提下,降低网络复杂度成为目前研究中的关键问题。本文提出空间分组注意力(spatial grouping attention,SGA),在降低网络深度、减少参数量、缩短训练和执行时间情况下依旧保持良好的性能。分析可知,局部肢体特征能精准地反映行为,为了降低无益于区分行为的局部特征对模型的干扰,本文将人体划分成多个区域,利用局部和全局特征的相似性,捕获能够代表全局运动的肢体部位,采用注意力机制在特征图中提升对关键部位的表示能力,使模型关注于更能区分行为的局部特征;其次,由于部分不同行为仅具有细微差别,导致帧中的关节坐标相似,从而模型容易被具有相似运动轨迹的行为干扰。为了缓解这一问题,本文捕获身体部位之间的相对运动,提出骨骼角度的高阶特征编码(higher-order feature coding of bone angles,HFBA)方法,捕捉更能反映细微运动差异的肢体关节间角度的变化,降低模型被相似行为的干扰;最后,针对低帧率问题,现有大多方案采用重复采样或填充空帧的策略,这些方法都未增加样本信息。因此,本文采用基于帧间差异的线性插帧(linear interpolation,LI)方案进行帧间插值,增加了样本信息量。
SiC Y, ChenW T, WangW,et al.An attention enhanced graph convolutional LSTM network for skeleton-based action recognition[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach:IEEE,2019:1227-1236.
[4]
ShiL, ZhangY F, ChengJ,et al.Skeleton-based action recognition with multi-stream adaptive graph convolutional networks[J].IEEE Transactions on Image Processing,2020,29(5):9532-9545.
[5]
PlizzariC, CanniciM, MatteucciM.Skeleton-based action recognition via spatial and temporal transformer networks[J].Computer Vision and I-mage Understanding,2021,208/209:103219.
[6]
YeF F, PuS L, ZhongQ Y,et al.Dynamic GCN:context-enriched topology learning for skeleton-based action recognition[C]//Proceedings of the 28th ACM International Conference on Multimedia.Seattle:ACM,2020:55-63.
ChengQ, ChengJ, RenZ L,et al.Multi-scale spatial-temporal convolutional neural network for skeleton-based action recognition[J].Pattern Analysis and Applications,2023,26(3):1303-1315.
[10]
HuangZ X, QinY S, LinX B,et al.Motion-driven spatial and temporal adaptive high-resolution graph convolutional networks for skeleton-based action recognition[J].IEEE Transactions on Circuits and Systems for Video Technology,2023,33(4):1868-1883.
LiM S, ChenS H, ChenX,et al.Actional-structural graph convolutional networks for skeleton-based action recognition[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach:IEEE,2019:3590-3598.
[13]
ShiL, ZhangY F, ChengJ,et al.Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach:IEEE,2019:12018-12027.
[14]
SongY F, ZhangZ, ShanC F,et al.Stronger,faster and more explainable:a graph convolutional baseline for skeleton-based action recognition[C]//Proceedings of the 28th ACM International Conference on Multimedia.Seattle :ACM,2020:1625-1633.
[15]
ChengK, ZhangY F, HeX Y,et al.Skeleton-based action recognition with shift graph convolutional network[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Seattle:IEEE,2020:180-189.
[16]
LiT J, LiuJ, ZhangW,et al.HARD-net:hardness-AwaRe discrimination network for 3D early activity prediction[C]//European Conference on Computer Vision.Cham:Springer,2020:420-436.
SheJ N, WangQ.EMD-GCN:graph convolution network with EM dynamic routing for skeleton-based action recognition[C]//Second International Conference on Biomedical and Intelligent Systems.Xiamen:SPIE,2023:473-478.
ShiL, ZhangY F, ChengJ,et al.Skeleton-based action recognition with directed graph neural networks[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach:IEEE,2019:7904-7913.
[23]
LiT J, LiuJ, ZhangW,et al.UAV-human:a large benchmark for human behavior understanding with unmanned aerial vehicles[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Nashville:IEEE,2021:16261-16270.