A gesture recognition network based on the coding and decoding infrastructure of Transformer was designed, and an optimized offset attention mechanism was introduced to extract hand features based on the self-attention mechanism. At the same time, in order to extract the local features of the hand structure better, a neighborhood aggregation strategy was designed. The three-dimensional (3D) complexity of the hand structure itself led to different levels of smoothness in different regions. When estimating gestures, ignoring this feature usually leads to the loss of local key information of the hand structure. In order to solve this problem, geometric decomposition of the hand structure was carried out, and sharp and flexible components were used to represent the sharp and flat regions of the hand structure, respectively. Different attention was paid to the characteristics of these two components through the attention mechanism. Experiments on MSRA, ICVL, and NYU datasets demonstrate that the accuracy of this algorithm is comparable to that of SOTA.
AhmetG, NeslihanK, GerhardR, et al. Real-time hand gesture detection and classification using convolutional neural networks [C]// 2019 IEEE International Conference on Automatic Face and Gesture Recognition. Lille, 2019: 1-8.
[2]
ChenX H, WangG J, GuoH K, et al. Pose guided structured region ensemble network for cascaded hand pose estimation [J]. Neurocomputing, 2020, 395: 138-149.
[3]
MoonG, ChangJ Y, LeeK M, et al. V2V PoseNet: voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map [C]// IEEE/CVF International Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018: 5079-5088.
[4]
GeL H, CaiY J, WengJ W, et al. Hand PointNet: 3D hand pose estimation using point set [C]// 2018 IEEE International Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018: 8417-8426.
[5]
ChenY J, TuZ G, GeL H, et al. So-HandNet: self-organizing network for 3D hand pose estimation with semi-supervised learning [C]// 2019 International Conference on Computer Vision. Seoul, 2019: 6960-6969.
[6]
QiC R, YiL, SuH, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space [J]. Advances in Neural Information Processing Systems, 2017, 30(10): 5105-5114.
[7]
XuM T, ZhangJ H, ZhouZ P. Learning geometry-disent angled representation for complementary understanding of 3D object point cloud [C]// 2021 AAAI Conference on Artificial Intelligence. Vancouver, 2021, 35(4): 3056-3064.
[8]
GuoM H, LiuZ J. Point cloud transformer [J]. Computational Visual Media,2021, 7(2): 187-199.
[9]
SunX, WeiY C, LiangS, et al. Cascaded hand pose regression [C]// 2015 IEEE International Conference on Computer Vision and Pattern Recognition. Boston, 2015: 824-832.
[10]
TompsonJ, SteinM, LecunY, et al. Real-time continuous pose recovery of human hands using convolutional networks [J]. ACM Transactions on Graphics, 2014, 33(5): 1-10.
[11]
TangD H, ChangC J, Alykhan, et al. Latent regression forest: structured estimation of 3D articulated hand posture [C]// 2014 IEEE International Conference on Computer Vision and Pattern Recognition. Columbus, 2014: 3786-3793.
[12]
MarkusO, VincentL. Deepprior++: improving fast and accurate 3D hand pose estimation. [C]// 2017 IEEE International Conference on Computer Vision Workshop. Venice, 2017: 585-594.
[13]
GeL H, LiangH, YuanJ S, et al. 3D convolutional neural networks for efficient and robust hand pose estimation from single depth images [C]// 2017 IEEE International Conference on Computer Vision and Pattern Recognition. Honolulu, 2017: 5679-5688.
[14]
WangG J, ChenX H, GuoH K, et al. Region ensemble network: towards good practices for deep 3D hand pose estimation [J]. Journal of Visual Communication and Image Representation,2018, 55(8): 404-414.
[15]
MarkusO, WohlhartP, LepetitV. Training a feedback loop for hand pose estimation [C]// 2015 IEEE/CVF International Conference on Computer Vision. Santiago, 2015: 3316-3324.
[16]
DengX M, YangS, ZhangY D, et al. Hand3D: hand pose estimation using 3D neural network [C]// 2017 IEEE/CVF International Conference on Computer Vision and Pattern Recognition. Honolulu, 2017: 549-557.
[17]
ZhouX Y, WanQ F, ZhangW, et al. Model-based deep hand pose estimation [C]// 2016 International Joint Conference on Artificial Intelligence. New York, 2016: 2421-2427.
[18]
WanC D, ProbstT, LucV G, et al. Crossing nets: combining GANs and VAEs with a shared latent space for hand pose estimation [C]// 2017 IEEE/CVF International Conference on Computer Vision and Pattern Recognition. Honolulu, 2017: 1196-1205.
[19]
MadadiM, EscaleraS, CarruescoA, et al. Occlusion aware hand pose recovery from ` sequences of depth images [C]// 2017 International Conference on Automatic Face & Gesture Recognition. Washington DC, 2017: 230-237.