Due to the lack of distinct textures and shapes, objects such as glass and mirrors pose challenges to traditional semantic segmentation algorithms, compromising the accuracy of visual tasks. A Transformer‑based RGBD cross‑modal fusion method is proposed for segmenting glass‑like objects. The method utilizes a Transformer network that extracts self‑attention features of RGB and depth through a cross‑modal fusion module and integrates RGBD features using a multi‑layer perceptron (MLP) mechanism to achieve the fusion of three types of attention features. RGB and depth features are fed back to their respective branches to enhance the network's feature extraction capabilities. Finally, a semantic segmentation decoder combines the features from four stages to output the segmentation results of glass‑like objects. Compared with the EBLNet method, the intersection‑and‑union ratio of the proposed method on the GDD, Trans10k and MSD datasets is improved by 1.64%, 2.26%, and 7.38%, respectively. Compared with the PDNet method on the RGBD-Mirror dataset, the intersection‑and‑union ratio is improved by 9.49%, verifying its effectiveness.
ZhaoH S, QiX J, ShenX Y,et al.ICNet for real‑time semantic segmentation on high‑resolution images[C]//Proceedings of the European Conference on Computer Vision (ECCV 2018).Munich:Springer International Publishing,2018:418‑434.
[2]
WangD Q, ZhangT, SüsstrunkS.NEMTO:neural environment matting for novel view and relighting synthesis of transparent objects[C]//2023 IEEE/CVF International Conference on Computer Vision (ICCV).Paris:IEEE,2023:317-327.
WangLu, WangShuai, ZhangGuo‑feng,et al. Pedestrian detection based on semantic segmentation attention and visible region prediction[J].Journal of Northeastern University (Natural Science ),2021,42(9):1261-1267.
ZhangZhi‑min, QiaoJian‑zhong, LinShu‑kuan,et al.A view reconstruction method based on deep network[J].Journal of Northeastern University (Natural Science),2020,41(8):1065-1069.
[7]
WangZ Y, LiY C, ChengX N,et al.Key points trajectory and multi‑level depth distinction based refinement for video mirror and glass segmentation[J].Multimedia Tools and Applications,2024,83(39):86513-86535.
[8]
YangX, MeiH Y, XuK,et al.Where is my mirror?[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV).Seoul:IEEE,2019:8808-8817.
[9]
LinJ Y, HeZ B, LauR W H.Rich context aggregation with reflection prior for glass surface detection[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Nashville:IEEE,2021:13410-13419.
[10]
LinJ Y, WangG D, LauR W H.Progressive mirror detection[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Seattle:IEEE,2020:3694-3702.
[11]
MeiH Y, YangX, WangY,et al.Don’t hit me!glass detection in real‑world scenes[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Seattle:IEEE,2020:3684-3693.
[12]
HeH, LiX T, ChengG L,et al.Enhanced boundary learning for glass‑like object segmentation[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV).Montreal:IEEE,2021:15839-15848.
[13]
MeiH Y, DongB, DongW,et al.Depth‑aware mirror segmentation[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Nashville:IEEE,2021:3043-3052.
[14]
ChangQ L, LiaoH H, MengX F,et al.PanoGlassNet:glass detection with panoramic RGB and intensity images[J].IEEE Transactions on Instrumentation and Measurement,2024,73:5019015.
[15]
LiuZ, LinY T, CaoY,et al.Swin transformer:hierarchical vision transformer using shifted windows[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV).Montreal:IEEE,2021:9992-10002.
[16]
YinW, ZhangJ M, WangO,et al.Learning to recover 3D scene shape from a single image[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Nashville:IEEE,2021:204-213.
[17]
TaudH, MasJ F.Multilayer perceptron (MLP)[M]//Cámacho O M T,Paegelow M,Mas J F,et al.Geomatic Approaches for Modeling Land Change Scenarios.Cham:Springer,2018:451-455.
[18]
ZhaoH S, ShiJ P, QiX J,et al.Pyramid scene parsing network[C]//2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Honolulu:IEEE,2017:6230-6239.
[19]
DengJ J, PanY W, YaoT,et al.MINet:meta‑learning instance identifiers for video object detection[J].IEEE Transactions on Image Processing,2021,30:6879-6891.
[20]
ZhouH J, XieX H, LaiJ H,et al.Interactive two‑stream decoder for accurate and fast saliency detection[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Seattle:IEEE,2020:9138-9147.
[21]
XieE Z, WangW J, WangW H,et al.Segmenting transparent objects in the wild[C]//Computer Vision and Pattern Recognition.Cham:Springer International Publishing,2020:696-711.
[22]
Wei J, Wang SH, HuangQ M.F3Net:fusion,feedback and focus for salient object detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence.New York:IEEE,2020:12321-12328.