PDF (1616K)
摘要
在内窥镜场景下组织表面纹理稀疏且视野受限,显著增加了深度估计难度。传统方法易受噪声、纹理缺失及光照变化千扰,导致结果稳定性不足。为提高内窥镜图像深度估计的准确性,提出了一种嵌入双重注意力机制的自监督单月内窥镜深度估计网络架构。该网络采用编码器-解码器结构,为了提高模型的准确性,本文在网络架构中集成了双重注意力机制,具体包括通道注意力和空间注意力模块,用以在通道和空间维度上提取远距离的上下文信息。同时引入光度重投影误差和结构相似性和边缘感知平滑作为损失函数,以适应内窥镜图像的特殊属性。最后在 Endoslam 公共数据集进行测试,结果表明本文所提方法能够有效提高内窥镜图像深度估计的准确性。
Abstract
The sparse texture and restricted field of view of the tissue surface in endoscopic scenes significantly increases the difficulty of depth estimation.Conventional methods are susceptible to interference from noise,missing texture and illumination variations,resul- ting in insufficient stability of the results.To improve the accuracy of endoscopic image depth estimation,a self-supervised monocular endoscopic depth estimation network architecture embedded with a dual attention mechanism is proposed.The network adopts an en- coder-decoder structure,and in order to improve the accuracy of the model,this paper integrates a dual-attention mechanism in the net- work architecture,which specifically includes channel attention and spatial attention modules for extracting contextual information at a distance in both channel and spatial dimensions.Meanwhile,photometric reprojection error and structural similarity and edge-aware smoothing are introduced as loss functions to accommodate the special properties of endoscopic images.Finally,it is tested on Endo- slam public dataset,and the results show that the method proposed in this paper can effectively improve the accuracy of depth estima- tion of endoscopic images.
关键词
Key words
[Author(id=1270706679260643886, tenantId=1045748351789510663, journalId=1209869159019319353, articleId=1270701084961178106, orderNo=0, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1270706679319364144, tenantId=1045748351789510663, journalId=1209869159019319353, articleId=1270701084961178106, authorId=1270706679260643886, language=EN, stringName=Lianwu ZHANG, firstName=Lianwu, middleName=null, lastName=ZHANG, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1270706679365501491, tenantId=1045748351789510663, journalId=1209869159019319353, articleId=1270701084961178106, authorId=1270706679260643886, language=CN, stringName=张连武, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=浙江工业大学 信息工程学院, 杭州 310023, bio={"content":"张连武,男,1999年生,硕士研究生,研究方向味深度估计;
"}, bioImg=null, bioContent=张连武,男,1999年生,硕士研究生,研究方向味深度估计;
, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1270706679189340711, tenantId=1045748351789510663, journalId=1209869159019319353, articleId=1270701084961178106, xref=null, ext=[AuthorCompanyExt(id=1270706679201923625, tenantId=1045748351789510663, journalId=1209869159019319353, articleId=1270701084961178106, companyId=1270706679189340711, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China), AuthorCompanyExt(id=1270706679218700843, tenantId=1045748351789510663, journalId=1209869159019319353, articleId=1270701084961178106, companyId=1270706679189340711, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=浙江工业大学 信息工程学院, 杭州 310023)])]), Author(id=1270706679411638838, tenantId=1045748351789510663, journalId=1209869159019319353, articleId=1270701084961178106, orderNo=1, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=3463126754@qq.com, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1270706679470359098, tenantId=1045748351789510663, journalId=1209869159019319353, articleId=1270701084961178106, authorId=1270706679411638838, language=EN, stringName=Sheng LI, firstName=Sheng, middleName=null, lastName=LI, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1270706679516496444, tenantId=1045748351789510663, journalId=1209869159019319353, articleId=1270701084961178106, authorId=1270706679411638838, language=CN, stringName=李胜, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=浙江工业大学 信息工程学院, 杭州 310023, bio={"content":"李胜,男,1984年生,博士,副教授,研究方向为图像处理、压缩感知、推荐系统等。E-mail:3463126754@qq.com
"}, bioImg=null, bioContent=李胜,男,1984年生,博士,副教授,研究方向为图像处理、压缩感知、推荐系统等。E-mail:3463126754@qq.com
, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1270706679189340711, tenantId=1045748351789510663, journalId=1209869159019319353, articleId=1270701084961178106, xref=null, ext=[AuthorCompanyExt(id=1270706679201923625, tenantId=1045748351789510663, journalId=1209869159019319353, articleId=1270701084961178106, companyId=1270706679189340711, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China), AuthorCompanyExt(id=1270706679218700843, tenantId=1045748351789510663, journalId=1209869159019319353, articleId=1270701084961178106, companyId=1270706679189340711, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=浙江工业大学 信息工程学院, 杭州 310023)])])]
张连武,李胜.
嵌入双重注意力机制的自监督单目内窥镜深度估计[J].
小型微型计算机系统, 2026, 47(5): 1212-1218 DOI:10.20009/j.cnki.21-1106/TP.2025-0107
| [1] |
Hsia C H, Chiang J S, Li H T, et al. A 3D endoscopic imaging sys- tem with content-adaptive filtering and hierarchical similarity analy- sis[J]. IEEE Sensors Journal, 2016, 16(11):4521-4530.
|
| [2] |
Mahmood F, Durrn J. Deep learning and conditional random fields- based depth estimation and topographical reconstruction from con- ventional endoscopy[J]. Medical Image Analysis, 2018, 48 (13): 230-243.
|
| [3] |
Pei L Y, Chun S H, Yu Q H, et al. Surgical navigation system based on the visual object tracking algorithm[C]// 4th Annual Interna- tional Conference on Network and Information Systems for Com- puters(ICNISC), 2018:160-164.
|
| [4] |
JIANG J J, LI Z Y, LIU X M. Deep learning based monocular depth estimation methods:a survey[J]. Chinese Journal of Computers, 2022, 45(6):1276-1307.
|
| [5] |
CIIEN Y F. Progress of visual depth estimation a-nd point cloud mapping[J]. Chinese Journal of Liquid Crystals and Displays, 2021, 36(6):896-911.
|
| [6] |
Nikolaus M, Eddy I, Philip H, et al. A large dataset-to train convo- lutional networks for disparity,optical flow,and scene flow estima tion[C]// IEEE Conferenceon Computer Vision and Pattern Recog- nition(CVPR), 2016:4040-4048.
|
| [7] |
Pang J, Sun W, Ren J S, et al. Cascade residual learning:a two-stage convolutional neural network for stereo matching[C]// IEEE Inter- national Conference on Computer Vision Workshops( ICCVW), 2017:878-886.
|
| [8] |
Alex K, Martirosyan H, Dasgupta S, et al. End-to-end-learning of geometry and context for deep stereo regression[C]// IEEE Inter- national Conference on Computer Vision(ICCV), 2017:66-75.
|
| [9] |
Grasag O G, Bernal E, Casado S, et al. Visual SLAM for handheld monocular endoscope[J]. IEEE Transactions on Medical Imaging, 2013, 33(1):135-146.
|
| [10] |
Leonard S, Sinha A, Reite A, et al. Evaluation and stability analysis of video-based navigation system for functional endoscopic sinus surgery on in vivo clinical data[J]. IEEE Transactions on Medical Imaging, 2018, 37(10):2185-2195.
|
| [11] |
WANG T M, ZHANG X H, ZHANG X B, et al. Review of research progress on laparoscopic augmented-reality navigation[J]. Robot- ics, 2019, 41 ( 1 ):124-136.
|
| [12] |
Qiu L, Ren H.Endoscope navigation and 3D reconstruction of oral cavity by visual SLAM with mitigated data scarcity[C]// Proceed- ings of the IEEE Conference on Computer Vision and Pattern Rec- ognition Workshops(CVPR), 2018:2197-2204.
|
| [13] |
Grigo R A, Jiang F, Rho S, et al. Depth estimation from single mo- nocular images using deep hybrid network[J]. Multimedia Tools and Applications, 2017, 76(18):18585-18604.
|
| [14] |
Chen S N, Tang M X, Kanjm, et al. Encoder decoderwith densely convolutional networks for monocular depth estimation[J]. Journal of the Optical Society of America A, 2019, 36(10):1709-1718.
|
| [15] |
Liu X, Sinha A, Unberath M, et al. Self-supervised learning for dense depth estimation in monocular endoscopy[C]// OR 2.0 Context-Aware Operating Theaters,Computer-Assisted Robotic En- doscopy,Clinical Image-Based Procedures,and Skin Image Analy- sis,2018:128-138.
|
| [16] |
Ii Y. FndoDepth :lightweight endoscopic monoculardepth estima- tion with CNN-transformer[C]// IEEE International Conference on Bioinformatics and Biomedicine(BIBM), 2023:4344-4351.
|
| [17] |
Liu S Y, Fan J F, Yang Y, et al. Monocular endoscopy images depth estimation with multi-scale residual fusion[J]. Computers in Biology and Medicine, 2024, 16(9):235-243.
|
| [18] |
Yang Z Y, Pan J J, Dai J, et al. Self-supervised endoscopy depth es- timation framework with CLIP-guidance segmentation[J]. Bio- medical Signal Processing and Control, 2024, 9(5):132-140.
|
| [19] |
Kustev B O, Guliz I G, Taylor L B, et al. EndoSLAM dataset and an unsupervised monocular visual odometryand depth estimation approach for endoscopic videos[J]. Med Image Anal, 2021, 7 (13):1020-1028.
|
| [20] |
Godard C, Mac Aodha O, Firman M, et al. Digging into self-super-vised monocular depth estimation[C]// Proceedings of the IEEE/ CVF International Conference on Computer Vision, 2019:3828-3838.
|
| [21] |
He K, Zhang X, Ren S, et al. Deep residual learning-for image rec- ognition[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016:770-778.
|
| [22] |
Godard C, Mac Aodha O, Brostow G J. Unsupervise-d monocular depth estimation with left-right consistency[C]// Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition, 2017:270-279.
|
| [23] |
Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]// Pro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018:7132-7141.
|
| [24] |
Rau A, Edwards P, Ahmad O F, et al. Implicit domain adaptation with conditional generative adversarial networks for depth predic- tion in endoscopy[J]. International Journal of Computer Assisted Radiology and Surgery, 2019, 14(7):1167-1176.
|
| [25] |
Hwang S J, Park S J, Kim G M, et al. Unsupervisedmonocular depth estimation for colonoscope system using feedback network[J]. Scnsors, 2021, 21(8):2691-2670.
|
| [26] |
Borgli H, Thambawita V, Smedsrud P H, et al. HyperKvasir,a com- prehensive multi-class image and video dataset for gastrointestinal endoscopy[J]. Scientific Data, 2020, 7(1):1-14.
|
| [27] |
Yang L H, Kang B Y, Huang Z L, et al. Depth anything:unleashing the power of large-scale unlabeled data[C]// IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition( CVPR), 2024: 10371-10381.
|
| [28] |
Shao S W, Pei Z C, Chen W H, et al. Self-supervised monocular depth and ego-motion estimation in endoscopy:appearance flow to the rescue[J]. Med Image Anal, 2022, 7(8):102-112.
|
| [29] |
Guizilini V, Ambrus R, Pillai S, et al. 3D packing forself-supervised monocular depth estimation[C]// Proceedings of the IEEE/CVF Conferencc on Computer Vision and Pattern Recognition, 2020: 2485-2494.
|
| [30] |
江俊君, 李震宇, 刘贤明. 基于深度学习的单 H 深度佔计方法综述[J]. 计算机学报, 2022, 45(6):1276-1307.
|
| [31] |
陈苑锋. 视觉深度估计与点云建图研究进展[J]. 液晶与显示, 2021, 36(6):896 911.
|
| [32] |
王出苗, 张晓会, 张学斌, 等. 腹控镜增强现实导航的研究进展综述[J]. 机器人, 2019, 41(1):124-136.
|
基金资助
国家基金重点项目(62233016)