To fully leverage the appearance features of RGB images and the geometric features of depth images, this paper proposes an "appearance-geometry" features parallel fusion method for object position estimation. First, in the feature extraction and fusion stage, a three-stream bidirectional fusion architecture is constructed to ensure that the parallel RGB image features and depth image features are fused at each encoding layer and decoding layer. To prevent the loss of important features and achieve sufficient fusion of the two types of features, two complementary attention mechanisms are designed, enabling the two features to gain both local and global complementarities. Sercond, in the pose inference calculation stage, considering the distance between the keypoints output by the network and the object’s center point, a keypoint detection network based on a combination of distance metric and distance constraint is proposed, achieving accurate position estimation. The proposed algorithm has been tested on two challenging 6D object position estimation datasets, validating its effectiveness.
GuanJ, HaoY M, WuQ X, et al. A survey of 6DoF object pose estimation methods for different application scenarios[J]. Sensors, 2024, 24(4): 1076.
[2]
MarulloG, TanziL, PiazzollaP, et al. 6D object position estimation from 2D images: A literature review[J]. Multimedia Tools and Applications, 2023, 82(16): 24605-24643.
WangJing, JinYu-chu, GuoPing, et al. A review of camera pose estimation methods based on deep learning[J]. Computer Engineering and Applications, 2023, 59(7): 1-14.
[5]
WangC, XuD E, ZhuY K, et al. Dense Fusion: 6D object pose estimation by iterative dense fusion[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE,2019: 3343-3352.
[6]
HeY S, HuangH B, FanH Q, et al. FB6D: A full flow bidirectional fusion network for 6D pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEEE, 2021: 3003-3013.
[7]
PengS D, LiuY, HuangQ X, et al. PVNet: Pixel-wise voting network for 6DoF pose estimation[J]. Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(6): 3212-3223.
[8]
LinS F, WangZ R, LingY G, et al. E2EK: End-to-end regression network based on keypoint for 6D pose estimation[J]. IEEE Robotics and Automation Letters, 2022, 7(3): 6526-6533.
[9]
XiangY, SchmidtT, NarayananV, et al. Pose CNN: A convolutional neural network for 6D object pose estimation in cluttered scenes[J]. ArXiv Preprint, 2017, 11: 171100199.
[10]
ZakharovS, ShugurovI, IlicS. DPOD: 6D pose object detector and refiner[C]∥Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway, NJ: IEEE, 2019: 1941-1950.
[11]
王连明, 吴鑫. 基于姿态估计的物体 3D 运动参数测量方法[J]. 吉林大学学报:工学版, 2023, 53(7): 2099-2108.
[12]
WangLian-ming, WuXin. Measurement of 3D motion parameters of an object based on attitude estimation[J]. Journal of Jilin University (Engineering and Technology Edition), 2023, 53(7): 2099-2108.
[13]
DingZ F, SunY X, XuS J, et al. Recent advances and perspectives in deep learning techniques for 3D point cloud data processing[J]. Robotics, 2023, 12(4): 100.
[14]
ZhouJ, ChenK, XuL L, et al. Deep fusion transformer network with weighted vector-wise keypoints voting for robust 6D object pose estimation[C]∥Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway, NJ: IEEE, 2023: 13967-13977.
BaiLin, LiuLin-jun, LiXuan-ang, et al. A depth estimation algorithm for monocular images based on self-supervised learning[J]. Journal of Jilin University (Engineering and Technology Edition), 2023, 53(4): 1139-1145.
[17]
SongC, SongJ R, HuangQ X. HybridPose: 6D object pose estimation under hybrid representations[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 431-440.
ZhangChen-jia, ZhuLei, YuLu. A review of attention mechanisms in convolutional neural networks[J]. Journal of Computer Engineering & Applications, 2021, 57(20):64-72.
[20]
HinterstoisserS, LepetitV, IlicS, et al. Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes[C]∥Computer Vision-ACCV 2012: 11th Asian Conference on Computer Vision. Piscataway, NJ: IEEE, 2013: 548-562.
[21]
CalliB, SinghA, WalsmanA, et al. The YCB object and model set: Towards common benchmarks for manipulation research[C]∥ International Conference on Advanced Robotics. Piscataway, NJ: IEEE, 2015: 510-517.