To address the issues of multi-scale object segmentation errors, poor correlation between multi-scale feature maps and feature maps at different stages in the DeepLabv3+ network, the following modules are proposed to incorporate,including a global context attention module, a cascade adaptive Scale awareness module, and an attention optimized fusion module. The global context attention module is embedded in the initial stage of the backbone network for feature extraction, allowing it to capture rich contextual information. The cascade adaptive scale awareness module models the dependencies between multi-scale features, enabling a stronger focus on the features relevant to the target. The attention optimized fusion module merges multiple layers of features through multiple pathways to enhance pixel continuity during decoding. The improved network is validated on the CityScapes dataset and PASCAL VOC2012 augmented dataset, and the experimental results demonstrate its ability to overcome the limitations of DeepLabv3+. Furthermore, the mean intersection over union reaches 76.2% and 78.7% respectively.
RonnebergerO, FischerP, BroxT. U-net: convolutional networks for biomedical image segmentation[C]∥Medical Image Computing and Computer-Assisted Intervention-MICCAI: The 18th International Conference, Munich, Germany, 2015: 234-241.
[2]
ChenJ, LuY, YuQ, et al. Transunet: transformers make strong encoders for medical image segmentation[J/OL]. [2023-07-02].arXiv preprint arXiv: 2102. 04306v1.
[3]
ZhaoT Y, XuJ D, ChenR, et al. Remote sensing image segmentation based on the fuzzy deep convolutional neural network[J]. International Journal of Remote Sensing, 2021, 42(16): 6264-6283.
[4]
YuanX H, ShiJ F, GuL C. A review of deep learning methods for semantic segmentation of remote sensing imagery[J]. Expert Systems with Applications, 2021, 169: No.114417.
[5]
XuZ Y, ZhangW, ZhangT X, et al. Efficient transformer for remote sensing image segmentation[J]. Remote Sensing, 2021, 13(18): No.3585.
[6]
BadrinarayananV, KendallA, CipollaR. Segnet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495.
[7]
YuC, GaoC, WangJ, et al. Bisenet v2: bilateral network with guided aggregation for real-time semantic segmentation[J]. International Journal of Computer Vision, 2021, 129: 3051-3068.
[8]
LongJ, ShelhamerE, DarrellT. Fully convolutional networks for semantic segmentation[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA,2015: 3431-3440.
[9]
ChenL C, Papandreou G KokkinosI, et al. Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(4): 834-848.
[10]
ChenL C, PapandreouG, SchroffF, et al. Rethinking atrous convolution for semantic image segmentation[J/OL].[2023-07-03]. arXiv preprint arXiv: 1706. 05587v3.
[11]
ChenL C, ZhuY, PapandreouG, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]∥Proceedings of the European conference on computer vision (ECCV),Munich, Germany,2018: 833-851.
[12]
WangJ, SunK, ChengT, et al. Deep high-resolution representation learning for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 43(10): 3349-3364.
[13]
LiuZ, LinY, CaoY, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]∥Proceedings of the IEEE/CVF International Conference on Computer Vision,Montreal, Canada, 2021: 10012-10022.
[14]
WangW, XieE, LiX, et al. Pyramid vision transformer: a versatile backbone for dense prediction without convolutions[C]∥Proceedings of the IEEE/CVF International Conference on Computer Vision,Montreal, Canada, 2021: 568-578.
[15]
ZhengS, LuJ, ZhaoH, et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA,2021: 6881-6890.
[16]
XieE, WangW, YuZ, et al. SegFormer: simple and efficient design for semantic segmentation with transformers[J]. Advances in Neural Information Processing Systems, 2021, 34: 12077-12090.
[17]
DosovitskiyA, BeyerL, KolesnikovA, et al. An image is worth 16×16 words: Transformers for image recognition at scale[J/OL]. [2023-07-04].arXiv preprint arXiv: 2010. 11929v2.
[18]
ZhaoH, ShiJ, QiX, et al. Pyramid scene parsing network[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 2881-2890.
[19]
HouQ, ZhangL, ChengM M, et al. Strip pooling: rethinking spatial pooling for scene parsing[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,Seattle, USA, 2020: 4003-4012.
[20]
PengC, ZhangX, YuG, et al. Large kernel matters-improve semantic segmentation by global convolutional network[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 4353-4361.
[21]
DingX, ZhangX, HanJ, et al. Scaling up your kernels to 31×31: revisiting large kernel design in CNNs[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,New Orleans, USA, 2022: 11963-11975.
[22]
GuoM H, LuC Z, LiuZ N, et al. Visual attention network[J/OL]. [2023-07-04].arXiv preprint arXiv:
[23]
HuJ, ShenL, SunG. Squeeze-and-excitation networks[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 7132-7141.
[24]
WangQ, WuB, ZhuP, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,Seattle, USA, 2020: 11534-11542.
[25]
FuJ, LiuJ, TianH, et al. Dual attention network for scene segmentation[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 3146-3154.
[26]
CordtsM, OmranM, RamosS, et al. The cityscapes dataset for semantic urban scene understanding[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,Las Vegas, USA, 2016: 3213-3223.
[27]
EveringhamM, EslamiS M A, Van GoolL, et al. The pascal visual object classes challenge: a retrospective[J]. International Journal of Computer Vision, 2015, 111: 98-136.
WangXue, LiZhan-shan, Ying-daLyu. Medical image segmentation algorithm based on multi-scale perception and semantic adaptation [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(3): 640-647.