嵌入双重注意力机制的自监督单目内窥镜深度估计

张连武; 李胜

doi:10.20009/j.cnki.21-1106/TP.2025-0107

小型微型计算机系统 ›› 2026, Vol. 47 ›› Issue (5) : 1212 -1218. DOI: 10.20009/j.cnki.21-1106/TP.2025-0107

计算机图形与图像

嵌入双重注意力机制的自监督单目内窥镜深度估计

张连武, 李胜

作者信息 +

Self Supervised Monocular Endoscope Depth Estimation with Embedded Dual Attention Mechanism

ZHANG Lianwu, LI Sheng

Author information +

文章历史 +

摘要

在内窥镜场景下组织表面纹理稀疏且视野受限,显著增加了深度估计难度.传统方法易受噪声、纹理缺失及光照变化干扰,导致结果稳定性不足.为提高内窥镜图像深度估计的准确性,提出了一种嵌入双重注意力机制的自监督单目内窥镜深度估计网络架构.该网络采用编码器-解码器结构,为了提高模型的准确性,本文在网络架构中集成了双重注意力机制,具体包括通道注意力和空间注意力模块,用以在通道和空间维度上提取远距离的上下文信息.同时引入光度重投影误差和结构相似性和边缘感知平滑作为损失函数,以适应内窥镜图像的特殊属性.最后在Endoslam 公共数据集进行测试,结果表明本文所提方法能够有效提高内窥镜图像深度估计的准确性.

Abstract

The sparse texture and restricted field of view of the tissue surface in endoscopic scenes significantly increases the difficulty of depth estimation.Conventional methods are susceptible to interference from noise,missing texture and illumination variations,resulting in insufficient stability of the results.To improve the accuracy of endoscopic image depth estimation,a self-supervised monocular endoscopic depth estimation network architecture embedded with a dual attention mechanism is proposed.The network adopts an encoder-decoder structure,and in order to improve the accuracy of the model,this paper integrates a dual-attention mechanism in the network architecture,which specifically includes channel attention and spatial attention modules for extracting contextual information at a distance in both channel and spatial dimensions.Meanwhile,photometric reprojection error and structural similarity and edge-aware smoothing are introduced as loss functions to accommodate the special properties of endoscopic images.Finally,it is tested on Endoslam public dataset,and the results show that the method proposed in this paper can effectively improve the accuracy of depth estimation of endoscopic images.

关键词

内窥镜图像 / 单目深度估计 / 通道注意力 / 空间注意力 / 自监督学习

Key words

endoscopic images / monocular depth estimation / channel attention mechanism / spatial attention mechanism / self-supervised learning

引用本文

引用格式 ▾

张连武, 李胜. 嵌入双重注意力机制的自监督单目内窥镜深度估计[J]. 小型微型计算机系统, 2026, 47(5): 1212-1218 DOI:10.20009/j.cnki.21-1106/TP.2025-0107

登录浏览全文

4963

注册一个新账户忘记密码

参考文献

[1] Hsia C H,Chiang J S,Li H T,et al.A 3D endoscopic imaging system with content-adaptive filtering and hierarchical similarity analysis[J].IEEE Sensors Journal,2016,16(11):4521-4530.
[2] Mahmood F,Durrn J.Deep learning and conditional random fields-based depth estimation and topographical reconstruction from conventional endoscopy[J].Medical Image Analysis,2018,48(13):230-243.
[3] Pei L Y,Chun S H,Yu Q H,et al.Surgical navigation system based on the visual object tracking algorithm[C]//4th Annual International Conference on Network and Information Systems for Computers(ICNISC),2018:160-164.
[4] JIANG J J,LI Z Y,LIU X M.Deep learning based monocular depth estimation methods:a survey[J].Chinese Journal of Computers,2022,45(6):1276-1307.
[5] CHEN Y F.Progress of visual depth estimation a-nd point cloud mapping[J].Chinese Journal of Liquid Crystals and Displays,2021,36(6):896-911.
[6] Nikolaus M,Eddy I,Philip H,et al.A large dataset-to train convolutional networks for disparity,optical flow,and scene flow estimation[C]//IEEE Conferenceon Computer Vision and Pattern Recognition(CVPR),2016:4040-4048.
[7] Pang J,Sun W,Ren J S,et al.Cascade residual learning:a two-stage convolutional neural network for stereo matching[C]//IEEE International Conference on Computer Vision Workshops(ICCVW),2017:878-886.
[8] Alex K,Martirosyan H,Dasgupta S,et al.End-to-end-learning of geometry and context for deep stereo regression[C]//IEEE International Conference on Computer Vision(ICCV),2017:66-75.
[9] Grasag O G,Bernal E,Casado S,et al.Visual SLAM for handheld monocular endoscope[J].IEEE Transactions on Medical Imaging,2013,33(1):135-146.
[10] Leonard S,Sinha A,Reite A,et al.Evaluation and stability analysis of video-based navigation system for functional endoscopic sinus surgery on in vivo clinical data[J].IEEE Transactions on Medical Imaging,2018,37(10):2185-2195.
[11] WANG T M,ZHANG X H,ZHANG X B,et al.Review of research progress on laparoscopic augmented-reality navigation[J].Robotics,2019,41(1):124-136.
[12] Qiu L,Ren H.Endoscope navigation and 3D reconstruction of oral cavity by visual SLAM with mitigated data scarcity[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops(CVPR),2018:2197-2204.
[13] Grigo R A,Jiang F,Rho S,et al.Depth estimation from single monocular images using deep hybrid network[J].Multimedia Tools and Applications,2017,76(18):18585-18604.
[14] Chen S N,Tang M X,Kanjm,et al.Encoder decoderwith densely convolutional networks for monocular depth estimation[J].Journal of the Optical Society of America A,2019,36(10):1709-1718.
[15] Liu X,Sinha A,Unberath M,et al.Self-supervised learning for dense depth estimation in monocular endoscopy[C]//OR 2.0 Context-Aware Operating Theaters,Computer-Assisted Robotic Endoscopy,Clinical Image-Based Procedures,and Skin Image Analysis,2018:128-138.
[16] Li Y.EndoDepthL:lightweight endoscopic monoculardepth estimation with CNN-transformer[C]//IEEE International Conference on Bioinformatics and Biomedicine(BIBM),2023:4344-4351.
[17] Liu S Y,Fan J F,Yang Y,et al.Monocular endoscopy images depth estimation with multi-scale residual fusion[J].Computers in Biology and Medicine,2024,16(9):235-243.
[18] Yang Z Y,Pan J J,Dai J,et al.Self-supervised endoscopy depth estimation framework with CLIP-guidance segmentation[J].Biomedical Signal Processing and Control,2024,9(5):132-140.
[19] Kustev B O,Guliz I G,Taylor L B,et al.EndoSLAM dataset and an unsupervised monocular visual odometryand depth estimation approach for endoscopic videos[J].Med Image Anal,2021,7(13):1020-1028.
[20] Godard C,Mac Aodha O,Firman M,et al.Digging into self-supervised monocular depth estimation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision,2019:3828-3838.
[21] He K,Zhang X,Ren S,et al.Deep residual learning-for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2016:770-778.
[22] Godard C,Mac Aodha O,Brostow G J.Unsupervise-d monocular depth estimation with left-right consistency[C]//Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition,2017:270-279.
[23] Hu J,Shen L,Sun G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2018:7132-7141.
[24] Rau A,Edwards P,Ahmad O F,et al.Implicit domain adaptation with conditional generative adversarial networks for depth prediction in endoscopy[J].International Journal of Computer Assisted Radiology and Surgery,2019,14(7):1167-1176.
[25] Hwang S J,Park S J,Kim G M,et al.Unsupervisedmonocular depth estimation for colonoscope system using feedback network[J].Sensors,2021,21(8):2691-2670.
[26] Borgli H,Thambawita V,Smedsrud P H,et al.HyperKvasir,a comprehensive multi-class image and video dataset for gastrointestinal endoscopy[J].Scientific Data,2020,7(1):1-14.
[27] Yang L H,Kang B Y,Huang Z L,et al.Depth anything:unleashing the power of large-scale unlabeled data[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR),2024:10371-10381.
[28] Shao S W,Pei Z C,Chen W H,et al.Self-supervised monocular depth and ego-motion estimation in endoscopy:appearance flow to the rescue[J].Med Image Anal,2022,7(8):102-112.
[29] Guizilini V,Ambrus R,Pillai S,et al.3D packing forself-supervised monocular depth estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2020:2485-2494.

附中文参考文献:
[4] 江俊君,李震宇,刘贤明.基于深度学习的单目深度估计方法综述[J].计算机学报,2022,45(6):1276-1307.
[5] 陈苑锋.视觉深度估计与点云建图研究进展[J].液晶与显示,2021,36(6):896-911.
[11] 王田苗,张晓会,张学斌,等.腹腔镜增强现实导航的研究进展综述[J].机器人,2019,41(1):124-136.