Aiming at the problem that the existing bridge crack detection methods do not fully extract the rotation feature of bridge cracks and have low detection and segmentation accuracy, a bridge crack detection method based on improved Mask RCNN with rotation self-attention was proposed. Firstly, on the basis of the Mask R-CNN instance segmentation network, the ViTAE network based on Transformer learning is used as the backbone feature extraction network to improve the detection and segmentation accuracy of cracks. Then, a rotating variable window self-attention mechanism was designed to integrate into the bridge crack detection network to improve the detection ability of the feature extraction network for crack rotation features. Finally, the deformable convolution was used to further fit the irregular geometry of cracks to strengthen the recognition ability of crack feature information. Experimental results show that compared with the original Mask R-CNN detection and segmentation method, the accuracy of the proposed method is improved by 4.85%, the recall rate is increased by 13.95%, and the F1-score can reach 91.66%. The proposed method can extract crack features more fully, achieve more accuratecrack detection, and is superior to the comparison methods in subjective and objective evaluation.
由图3可知,ViTAE桥梁裂缝特征提取模块首先堆叠多个还原单元RC模块和正常单元NC模块,然后,还原单元RC模块将特征图缩小为4倍后用token序列转换为图像(Sequence to image,Seq2Img)运算进行展平,随后输入正常单元NC模块。正常单元NC模块不改变图像分辨率,模型通过RC模块和NC模块的堆叠增加模型容量,两种模块详细结构如图4所示。最后,将多个阶段提取得到的不同尺度的特征图进行多尺度特征融合,进一步提升模型的尺度等变性能力。
为评估本文对混凝土桥梁裂缝的检测性能的定量评价,采用平均准确率(mean Average precision,mAP)、平均召回率 (Average recall,AR)、调和平均数(F1-score,F1)进行定量评价。其中,平均准确率AP越高表示模型误检程度越低;平均召回率AR越高表示模型漏检率越低;调和平均数F1综合了准确率和召回率,该值越高表示模型整体检测性能越好,评价指标对应公式如下:
YangGuo-jun, QiYa-hui, ShiXiu-ming. Review of bridge crack detection based on digital image technology[J]. Journal of Jilin University (Engineering and Technology Edition), 2024, 54(2): 313-332.
XuCheng-ji, WangXiao-hu, DaiYu-qing, et al. In-situ characterization of concrete splitting damage and capillary transport processes[J]. China Civil Engineering Journal, 2023, 56(8): 1-11.
[5]
ZoubirH, RguigM, AroussiM E, et al. Concrete bridge crack image classification using histograms of oriented gradients, uniform local binary patterns, and kernel principal component analysis[J]. Electronics, 2022, 11(20): 1-11.
[6]
VivekananthanV, VigneshR, VasanthaseelanS, et al. Concrete bridge crack detection by image processing technique by using the improved OTSU method[J]. Materials Today: Proceedings, 2023, 7: 1002-1007.
ZhuSu-ya, DuJian-chao, LiYun-song, et al. Method for bridge crack detection based on the U-Net convolutional networks[J]. Journal of Xidian University, 2019, 46(4): 35-42.
TanGuo-jin, JiOu, AiYong-ming, et al. Bridge crack image segmentation method based on improved DeepLabv3+model[J]. Journal of Jilin University (Engineering and Technology Edition), 2024, 54(1): 173-179.
[15]
XiaoR Q. YOLOv5s-GTB:light-weighted and improved YOLOv5s for bridge crack detection[DB/OL]. [2024-01-30].
[16]
DengL, ChuH H, ShiP, et al. Region-based CNN method with deformable modules for visually classifying concrete cracks[J]. Applied Sciences. 2020, 10(7): 2528.
[17]
LiuH J, YangJ, MiaoX Y, et al. Crackformer network for pavement crack segmentation[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 24(9): 9240-9252.
[18]
HeK M, GkioxariG, DollarP, et al. Mask R-CNN[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 42(2): 386-397.
[19]
ZhangQ, XuY, ZhangJ, et al. ViTAEv2: vision transformer advanced by exploring inductive bias for image recognition and beyond[J]. International Journal of Computer Vision, 2023, 131: 1141-1162.
[20]
WorrallD E, GarbinS J, TurmukhambetovD, et al. Harmonic networks: deep translation and rotation equivariance[C]∥IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 7168-7177.
[21]
WangD, ZhangQ M, XuY F, et al. Advancing plain vision transformer towards remote sensing foundation model[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 1-15.
[22]
ShishV, NoamS, NikiP, et al. Attention is all you need[DB/OL].[2024-01-30].
YuJia-yong, LiFeng, XueXian-kai, et al. Intelligent identification of bridge structural cracks based on unmanned aerial vehicle and Mask R-CNN[J]. China Journal of Highway and Transport, 2021, 34(12): 80-90.
[25]
ZhuX, HuH, LinS, et al. Deformable convnets v2: more deformable, better results[C]∥IEEE/CVF Conference on Computer Vision and Pattern Recognition, Los Angeles,USA, 2019: 9300-9308.
LiLiang-fu, MaWei-fei, LiLi, et al. Research on detection algorithm for bridge cracks based on deep learning[J]. Acta Automatica Sinica, 2019, 45(9): 1727-1742.
[28]
LiuZ, YutongL, YueC, et al. Swin-Transformer: hierarchical vision transformer using shifted windows[C]∥IEEE International Conference on Computer Vision, Montreal, Canada, 2021: 9992-10002.
[29]
WangW H, DaiJ F, ChenZ, et al. InternImage: exploring large-scale vision foundation models with deformable convolutions[DB/OL]. [2023-8-14].
[30]
BowenC, IshanM, AlexanderG S, et al. Masked-attention mask transformer for universal image segmentation[C]∥IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 1280-1289.