Objective Road damage detection serves as the premise and foundation of road maintenance and is essential for road safety and timely repair. In real-world scenarios, mobile terminal devices are more practical for detection tasks due to constraints in working environments. However, the limited computational power of these devices makes it challenging to apply detection algorithms with high computational complexity. This study proposes MFF‒YOLO, a lightweight road damage detection algorithm that features multiscale feature fusion based on YOLOv7-tiny to address the issue of existing road damage detection algorithms struggling to balance detection accuracy and computational efficiency, which hinders their deployment on mobile devices. Methods Firstly, road damage often occurred in complex backgrounds with high levels of noise, so it was crucial to extract road damage features accurately. The multiscale feature extraction block (MFFBlock) and the downsampling block (DSB) were designed to improve the feature extraction capability of the algorithm. MFFBlock employed a multiscale feature extraction and fusion strategy that integrated a multi-branch structure with various types of convolution to produce outputs with different receptive fields, extracting both global and local feature information. DSB combined max pooling with convolution at a stride of 2 to maximize the retention of effective feature information and to ensure computational stability during downsampling. Based on MFFBlock and DSB, an efficient multiscale feature extraction backbone network (MFEnet) was constructed, utilizing MFFBlock of different sizes at various stages to enhance the algorithm’s multiscale feature representation ability. Secondly, after extracting road damage features, it was equally crucial to efficiently fuse these features in the neck network. The Slim-Neck design paradigm was used in the neck feature fusion network, utilizing GSConv and VoV‒GSCSPC modules to aggregate features while making the network more lightweight without losing important information. In addition, a novel feature selective fusion structure (FSF‒PAFPN) was proposed. Building on PAFPN, FSF‒PAFPN introduced a feature selective fusion mechanism (FSF), which used a channel attention (CA) module to selectively fuse shallow and deep features across layers at the same resolution level, achieving simple yet effective multiscale feature fusion. Finally, the K-Means algorithm was utilized to cluster the RDD2022 dataset, obtaining anchors that were more consistent with the shape characteristics of road damage objects, reducing the training difficulty, and improving detection accuracy. The RDD2022 road damage dataset was used for algorithm training and verification and consisted of a total of 23 767 damage images. These images, which included four typical road damage types: longitudinal cracks (D00), transverse cracks (D10), alligator cracks (D20), and potholes (D40), were divided into training, validation, and test sets in a ratio of 8:1:1. Results and Discussions The results from the ablation experiment showed that using K-Means to re-cluster the RDD2022 dataset for generating anchors improved mAP@0.5 by 0.5%. In addition, implementing MFEnet as the backbone feature extraction network increased mAP@0.5 by 0.6%, while its parameters and FLOPs were 5.59×106 and 12.3×109, reduced by 7.1% and 6.8%, respectively. Adopting the Slim-Neck design paradigm enhanced mAP@0.5 by 0.5%, while reducing parameters and FLOPs by 20.0% and 21.1%, to 4.47×106 and 9.7×109, respectively. On this basis, adopting the FSF-PAFPN feature fusion structure further improved mAP@0.5 by 0.7%, with parameters and FLOPs increased to 4.51×106 and 9.8×109, respectively. The results of the algorithm performance comparison based on RDD2022 showed that, in terms of accuracy, MFF‒YOLO outperformed all other evaluation metrics except for recall (R) and average precision (AP) in the D40 category. The AP for the D00, D10, and D20 categories reached 60.8%, 58.9%, and 68.6%, respectively, with a precision (P) of 64.7%. It achieved the highest mAP@0.5 of 60.1%, an improvement of 2.3 percentage points compared to YOLOv7-tiny. In terms of computational complexity, MFF‒YOLO also excelled, with parameters and FLOPs of 4.51×106 and 9.8×109, respectively, values that were only slightly higher than YOLOv8n, YOLO-LWNet-s, and LE-YOLOv5. Compared to YOLOv7-tiny, these metrics were reduced by 25.1% and 25.8%, respectively. In addition, MFF‒YOLO reached a detection speed of 81 frames per second, maintaining an impressive real-time detection capability. Comparing the actual detection effects of YOLOv7-tiny, YOLO-LWNet-s, LE-YOLOv5, and MFF‒YOLO algorithms indicates that the detection performance of MFF‒YOLO exceeded that of the other algorithms. MFF‒YOLO accurately located road damage objects with high confidence and demonstrated strong performance even when the features were not obvious or when the background was complex. Conclusions The results demonstrated that the MFENet proposed in this study can effectively enhance the network’s multiscale feature extraction capability while reducing computational complexity. The Slim-Neck design paradigm ensured the aggregation of features while maintaining a lightweight network structure. The FSF‒PAFPN structure achieved more efficient multiscale feature fusion and improved the algorithm’s ability to characterize road damage features. Accordingly, MFF‒YOLO significantly improved detection accuracy while reducing computational complexity. Considering accuracy, computational complexity, and detection speed, MFF‒YOLO achieved a balanced performance and is more suitable for road damage detection on mobile devices, providing a valuable reference for mobile terminal road damage detection.
受MS-Block[25]的启发,本文设计了一种采用分层特征提取与融合策略的MFFBlock。图2为MFFBlock结构。MFFBlock通过通道扩张与分割(Split)的方式将输入特征 X (,H为高度,W为宽度,C为通道数)分为3个分支(),每个分支采用不同方式的卷积进行特征提取。第1个分支利用核大小为1×1的卷积(Conv1×1)着重提取局部特征信息,而第2个与第3个分支则采用反向瓶颈结构(IB)[26]以实现对深度卷积(DWConv)大卷积核(卷积核大小为,)的高效利用,即和,在增大感受野的同时降低计算成本。此外,第3个分支的输入特征在进入之前会与第2个分支的输出特征融合,进一步增大了感受野。通过此方式,每个分支的输出都被编码了不同尺度的特征信息,其数学表达如下。
算法的评估主要考虑两个部分,即计算复杂度与准确性。计算复杂度由参数量(params)与每秒浮点运算量(FLOPs)来表征,二者的值越小,表示算法的计算成本越小以及对硬件的要求越低,在移动终端设备中部署就越容易。准确性由精确率(precision,记为P)、召回率(recall,记为R)、平均精确率(average precision,记为CAP)、平均精确率均值(mean average precision,记为MmAP)来表征。此外,再选择检测帧率(frames per second,FPS,记为fFPS)以反映算法的实时性能。P、R、CAP、MmAP及fFPS的计算公式如下。
HouYue, LiQiuhan, ZhangChen,et al.The state-of-the-art review on applications of intrusive sensing,image processing techniques,and machine learning methods in pavement monitoring and analysis[J].Engineering,2021,7(6):845‒856. doi:10.1016/j.eng.2020.07.030
[2]
MaJian, ZhaoXiangmo, HeShuanhai,et al.Summary of pavement detection technology[J].Journal of Traffic and Transportation Engineering,2017,17(5):121‒137. doi:10.3969/j.issn.1671-1637.2017.05.012
DuZhenyu, YuanJie, XiaoFeipeng,et al.Application of image technology on pavement distress detection:A review[J].Measurement,2021,184:109900. doi:10.1016/j.measurement.2021.109900
[5]
YangGuidong, LiuKangcheng, ZhangJihan,et al.Datasets and processing methods for boosting visual inspection of civil infrastructure:A comprehensive review and algorithm comparison for crack classification,segmentation,and detection[J].Construction and Building Materials,2022,356:129226. doi:10.1016/j.conbuildmat.2022.129226
[6]
RenShaoqing, HeKaiming, GirshickR,et al.Faster R-CNN:Towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137‒1149. doi:10.1109/tpami.2016.2577031
[7]
HeKaiming, GkioxariG, DollárP,et al.Mask R-CNN[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV).Venice:IEEE,2017:2980‒2988. doi:10.1109/iccv.2017.322
[8]
CaiZhaowei, VasconcelosN.Cascade R-CNN:Delving into high quality object detection[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:6154‒6162. doi:10.1109/cvpr.2018.00644
[9]
CaoM T, TranQ V, NguyenN M,et al.Survey on performance of deep learning models for detecting road damages using multiple dashcam image resources[J].Advanced Engineering Informatics,2020,46:101182. doi:10.1016/j.aei.2020.101182
[10]
LiuZhen, YeohJ K W, GuXingyu,et al.Automatic pixel-level detection of vertical cracks in asphalt pavement based on GPR investigation and improved mask R-CNN[J].Automation in Construction,2023,146:104689. doi:10.1016/j.autcon.2022.104689
[11]
PeiZixiang, LinRongheng, ZhangXiubao,et al.CFM:A consistency filtering mechanism for road damage detection[C]//Proceedings of the 2020 IEEE International Conference on Big Data (Big Data).Atlanta:IEEE,2020:5584‒5591. doi:10.1109/bigdata50022.2020.9377911
[12]
RedmonJ, FarhadiA.YOLO9000:Better,faster,stronger[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Honolulu:IEEE,2017:6517‒6525. doi:10.1109/cvpr.2017.690
WangC Y, BochkovskiyA, LiaoH M.YOLOv7:Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Vancouver:IEEE,2023:7464‒7475. doi:10.1109/cvpr52729.2023.00721
LinT Y, GoyalP, GirshickR,et al.Focal loss for dense object detection[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV).Venice:IEEE,2017:2999‒3007. doi:10.1109/iccv.2017.324
[19]
YanKun, ZhangZhihua.Automated asphalt highway pavement crack detection based on deformable single shot multi-box detector under a complex environment[J].IEEE Access,2021,9:150925‒150938. doi:10.1109/access.2021.3125703
[20]
Al DuhayyimM, MalibariA A, AlharbiA,et al.Road damage detection using the hunger games search with Elman neural network on high-resolution remote sensing images[J].Remote Sensing,2022,14(24):6222. doi:10.3390/rs14246222
[21]
RenMiao, ZhangXianfeng, ChenXiao,et al.YOLOv5s-M:A deep learning network model for road pavement damage detection from urban street-view imagery[J].International Journal of Applied Earth Observation and Geoinformation,2023,120:103335. doi:10.1016/j.jag.2023.103335
[22]
DuFujun, JiaoShuangjian.Improvement of lightweight convolutional neural network model based on YOLO algorithm and its research in pavement defect detection[J].Sensors,2022,22(9):3537. doi:10.3390/s22093537
[23]
WuChenguang, YeMin, ZhangJiale,et al.YOLO-LWNet:A lightweight road damage object detection network for mobile terminal devices[J].Sensors,2023,23(6):3268. doi:10.3390/s23063268
[24]
DiaoZhuo, HuangXianfu, LiuHan,et al.LE-YOLOv5:A lightweight and efficient road damage detection algorithm based on improved YOLOv5[J].International Journal of Intelligent Systems,2023,2023:8879622. doi:10.1155/2023/8879622
[25]
LiHulin, LiJun, WeiHanbing,et al.Slim-neck by GSConv:A lightweight-design for real-time detector architectures[J].Journal of Real-Time Image Processing,2024,21(3):62. doi:10.1007/s11554-024-01436-6
[26]
ChenYuming, YuanXinbin, WangJiabao,et al.YOLO-MS:Rethinking multi-scale representation learning for real-time object detection[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2025,47(6):4240‒4252. doi:10.1109/tpami.2025.3538473
[27]
SandlerM, HowardA, ZhuMenglong,et al.MobileNetV2:Inverted residuals and linear bottlenecks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:4510‒4520. doi:10.1109/cvpr.2018.00474
[28]
HuJie, ShenLi, SunGang.Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:7132‒7141. doi:10.1109/cvpr.2018.00745
AryaD, MaedaH, GhoshS K,et al.RDD2022:A multi-national image dataset for automatic road damage detection[J].Geoscience Data Journal,2024,11(4):846‒862.
[31]
SelvarajuR R, CogswellM, DasA,et al.Grad-CAM:Visual explanations from deep networks via gradient-based localization[J].International Journal of Computer Vision,2020,128(2):336‒359. doi:10.1007/s11263-019-01228-7