This study proposed a novel skeleton-based action recognition method utilizing Graph Convolutional Networks (GCN), which addressed the limitations of conventional spatiotemporal graph convolution frameworks that uniformly process spatiotemporal features while neglecting inter-channel interactions. Specifically, the proposed model enhanced the complementary representation of spatial information through the fusion of multiple topological matrices coupled with the introduction of a Channel Interaction Attention (CIA) module. The CIA module was designed to capture dynamic frame-level information and human structural features across spatiotemporal dimensions, effectively modeling inter-channel relationships and thereby improving skeletal data representation. Furthermore, a Temporal Adaptive Feature Fusion (TAF) module was incorporated to adaptively select varying dilation rates and kernel sizes across network layers. This module replaced traditional residual connections between initial features and temporal module outputs, effectively addressing context aggregation and initial feature integration challenges. The TAF module separately processed initial features and temporal information, enabling efficient feature fusion and successful integration of initial features with high-dimensional temporal features, which significantly enhanced spatiotemporal feature extraction. Experimental results demonstrated that on the NW-UCLA dataset, the proposed method achieved 2.1% higher recognition accuracy than the baseline model CTR-GCN (Channel-wise Topology Refinement Graph Convolution Network) and 0.7% improvement over state-of-the-art methods Info-GCN. For the NTU RGB+D 120 and NTU RGB+D datasets under different splits, the model showed consistent performance gains of 0.7%, 0.8% and 0.5%, 0.6%, respectively, surpassing all existing methods across evaluation metrics. These results confirmed the model's superior performance in both spatiotemporal feature extraction and skeleton-based action recognition tasks.
WEINLANDD, RONFARDR, BOYERE. A Survey of Vision-based Methods for Action Representation, Segmentation and Recognition[J]. Comput Vis Image Underst, 2011, 115(2): 224-241. DOI: 10.1016/j.cviu.2010.10.002 .
[2]
POPPER. A Survey on Vision-based Human Action Recognition[J]. Image Vis Comput, 2010, 28(6): 976-990. DOI: 10.1016/j.imavis.2009.11.014 .
[3]
MOCCIAS, MIGLIORELLIL, CARNIELLIV, et al. Preterm Infants' Pose Estimation with Spatio-temporal Features[J]. IEEE Trans Biomed Eng, 2020, 67(8): 2370-2380. DOI: 10.1109/TBME.2019.2961448 . [PubMed]
[4]
LIS, LIW, COOKC, et al. Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN[EB/OL]. (2018-05-22) [2024-07-05].
[5]
KEQ H, BENNAMOUNM, ANS J, et al. A New Representation of Skeleton Sequences for 3D Action Recognition[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE, 2017: 4570-4579. DOI: 10.1109/CVPR.2017.486 .
LIM S, CHENS H, CHENX, et al. Actional-structural Graph Convolutional Networks for Skeleton-based Action Recognition[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE, 2019: 3590-3598. DOI: 10.1109/CVPR.2019.00371 .
[8]
SHIL, ZHANGY, CHENGJ, et al. Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA: IEEE, 2019: 12018-12027. DOI: 10.1109/CVPR.2019.01230
[9]
CHENY X, ZHANGZ Q, YUANC F, et al. Channel-wise Topology Refinement Graph Convolution for Skeleton-based Action Recognition[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). New York: IEEE, 2021: 13339-13348. DOI: 10.1109/ICCV48922.2021.01311 .
[10]
GAOB K, DONGL, BIH B, et al. Focus on Temporal Graph Convolutional Networks with Unified Attention for Skeleton-based Action Recognition[J]. Appl Intell, 2022, 52(5): 5608-5616. DOI: 10.1007/s10489-021-02723-6 .
[11]
QIUH Y, WUY, DUANM M, et al. GLTA-GCN: Global-local Temporal Attention Graph Convolutional Network for Unsupervised Skeleton-based Action Recognition[C]//2022 IEEE International Conference on Multimedia and Expo (ICME). New York: IEEE, 2022: 1-6. DOI: 10.1109/ICME52920.2022.9859752 .
[12]
MAB, WANGX R, ZHANGH, et al. CBAM-GAN: Generative Adversarial Networks Based on Convolutional Block Attention Module[M]//Artificial Intelligence and Security. Cham: Springer International Publishing, 2019: 227-236. DOI: 10.1007/978-3-030-24274-9_20 .
[13]
LIUM Y, LIUH, CHENC. Enhanced Skeleton Visualization for View Invariant Human Action Recognition[J]. Pattern Recognit, 2017, 68: 346-362. DOI: 10.1016/j.patcog.2017.02.030 .
[14]
LIUJ, WANGG, HUP, et al. Global Context-aware Attention LSTM Networks for 3D Action Recognition[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE, 2017: 3671-3680. DOI: 10.1109/CVPR.2017.391 .
[15]
DUANH D, ZHAOY, CHENK, et al. Revisiting Skeleton-based Action Recognition[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE, 2022: 2959-2968. DOI: 10.1109/CVPR52688.2022.00298 .
[16]
LIUZ, ZHANGH, CHENZ, et al. Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition[EB/OL]. (2020-05-19) [2024-07-10].
[17]
CHIH G, HAM H, CHIS, et al. InfoGCN: Representation Learning for Human Skeleton-based Action Recognition[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE, 2022: 20154-20164. DOI: 10.1109/CVPR52688.2022.01955 .
[18]
KEL P, PENGK C, LYUS W. Towards To-a-T Spatio-temporal Focus for Skeleton-based Action Recognition[J]. Proc AAAI Conf Artif Intell, 2022, 36(1): 1131-1139. DOI: 10.1609/aaai.v36i1.19998 .
[19]
SONGS J, LANC L, XINGJ L, et al. An End-to-end Spatio-temporal Attention Model for Human Action Recognition from Skeleton Data[J]. Proc AAAI Conf Artif Intell, 2017, 31(1): 4263-4270. DOI: 10.1609/aaai.v31i1.11212
[20]
QIUH, HOUB, RENB, et al. Spatio-Temporal Tuples Transformer for Skeleton-Based Action Recognition[EB/OL]. (2022-01-08) [2024-07-10].
[21]
SONGY F, ZHANGZ, SHANC F, et al. Constructing Stronger and Faster Baselines for Skeleton-based Action Recognition[J]. IEEE Trans Pattern Anal Mach Intell, 2023, 45(2): 1474-1488. DOI: 10.1109/TPAMI.2022.3157033 .
[22]
ZHOUS B, CHENR R, JIANGX Q, et al. 2s-GATCN: Two-stream Graph Attentional Convolutional Networks for Skeleton-based Action Recognition[J]. Electronics, 2023, 12(7): 1711. DOI: 10.3390/electronics12071711 .
[23]
HUJ, SHENL, ALBANIES, et al. Squeeze-and-Excitation Networks[EB/OL]. (2019-05-16) [2024-01-05].
[24]
WANGZ W, SHEQ, SMOLICA. ACTION-Net: Multipath Excitation for Action Recognition[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE, 2021: 13209-13218. DOI: 10.1109/cvpr46437.2021.01301 .
WANGS Q, ZHANGY, ZHAOM, et al. Skeleton-Based Action Recognition via Temporal-Channel Aggregation[EB/OL]. (2022-08-08) [2024-10-25]
[27]
WANGQ L, WUB G, ZHUP F, et al. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE, 2020: 11531-11539. DOI: 10.1109/cvpr42600.2020.01155 .
[28]
DAIY M, GIESEKEF, OEHMCKES, et al. Attentional Feature Fusion[C]//2021 IEEE Winter Conference on Applications of Computer Vision (WACV). New York: IEEE, 2021: 3559-3568. DOI: 10.1109/wacv48630.2021.00360 .
[29]
VEMULAPALLIR, ARRATEF, CHELLAPPAR. Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2014: 588-595. DOI: 10.1109/CVPR.2014.82 .
[30]
GAOX H, DUS Y, YANGY. Glimpse and Focus: Global and Local-scale Graph Convolution Network for Skeleton-based Action Recognition[J]. Neural Netw, 2023, 167: 551-558. DOI: 10.1016/j.neunet.2023.07.051 .
[31]
SIC Y, CHENW T, WANGW, et al. An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-based Action Recognition[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE, 2019: 1227-1236. DOI: 10.1109/CVPR.2019.00132 .
[32]
CHENGK, ZHANGY F, HEX Y, et al. Skeleton-based Action Recognition with Shift Graph Convolutional Network[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE, 2020: 180-189. DOI: 10.1109/cvpr42600.2020.00026 .
[33]
XUK L, YEF F, ZHONGQ Y, et al. Topology-aware Convolutional Neural Network for Efficient Skeleton-based Action Recognition[J]. Proc AAAI Conf Artif Intell, 2022, 36(3): 2866-2874. DOI: 10.1609/aaai.v36i3.20191 .
[34]
HUANGX, ZHOUH, WANGJ, et al. Graph Contrastive Learning for Skeleton-Based Action Recognition[EB/OL]. (2023-06-10) [2024-10-15].
[35]
LIUJ, SHAHROUDYA, XUD, et al. Spatio-temporal LSTM with Trust Gates for 3D Human Action Recognition[M]//Computer Vision-ECCV 2016. Cham: Springer International Publishing, 2016: 816-833. DOI: 10.1007/978-3-319-46487-9_50 .
[36]
YEF F, PUS L, ZHONGQ Y, et al. Dynamic GCN: Context-enriched Topology Learning for Skeleton-based Action Recognition[C]//Proceedings of the 28th ACM International Conference on Multimedia. Seattle: ACM, 2020: 55-63. DOI: 10.1145/3394171.3413941 .