Objective Utilizing advanced methods to rapidly and accurately identify issues such as river and lake disorder and water quality represents a key approach to improving the efficiency of river and lake management and the river chief system. This approach aims to effectively address problems related to limited observation range, low efficiency, and delayed response to river-related issues that occur with traditional manual patrols employing unmanned aerial vehicle (UAV) patrols integrated with image recognition technology. However, current challenges persist, including complex river surface environments, irregular shapes, and diverse forms of floating debris in rivers of varying sizes, widespread distribution of algal blooms, and concealed or easily obstructed locations of illegal sand mining activities. These factors make it difficult for traditional image recognition algorithms to accurately detect such issues. Therefore, this study proposes a river patrol image recognition algorithm based on an improved YOLO v5s, referred to as the YOLO v5s‒CDF algorithm. Methods The YOLO v5s‒CDF algorithm introduced several key improvements to enhance the detection performance of the original YOLO v5s model. First, the FocalNext module replaced the C3 module in the backbone network, incorporating depth-wise separable convolutions and dilated convolutions to improve feature extraction capabilities for small objects. The depth-wise separable convolution applied different convolutional kernels to each input channel, extracting important features from multiple channels, while the dilated convolutions increased the receptive field without adding parameters or computational cost, capturing broader contextual information. Second, the Context Aggregation attention mechanism was added between the neck and head structures to adjust the weights of input data, enabling the model to focus on key image information. This attention mechanism combined channel attention and spatial attention to refine the feature representation, enhancing the model's ability to capture critical details. Lastly, the Decoupled Head replaced the original coupled detection head, separating feature extraction and task prediction to accelerate network convergence and further enhance small object detection. The dataset used in this study consisted of aerial images captured by DJI Air 2s drones, covering various rivers in Hebei Province, China. The images were annotated using the Labelimg tool and divided into training and validation sets in an 8 ∶ 2 ratio. Data augmentation techniques, such as flipping, color transformation, and affine transformation, were applied to the training set to improve the model's robustness and generalization ability. The study evaluated the model's performance using metrics such as precision, recall, F1-score, Average Precision (AP), and mean Average Precision (mAP). The number of parameters and floating-point operations per second (FLOPs) were utilized to represent the model's complexity. Results and Discussions The experimental results demonstrated that the YOLO v5s‒CDF model achieved a mean Average Precision (mAP) of 86.7%, surpassing the original YOLO v5s model by 4.1%. The improved model exhibited significant enhancements in both precision and recall. The precision increased from 85.9% in the original YOLO v5s model to 87.0% in the YOLO v5s‒CDF model, while the recall improved from 76.4% to 80.8%, indicating a substantial reduction in missed detections. When compared to other models, such as YOLO v7-tiny and YOLO X-s, the YOLO v5s‒CDF model outperformed them with mAP improvements of 20.4 percentage points and 8.3 percentage points, respectively. Among all target categories, the detection performance for river garbage showed the most significant improvement, with the average precision increasing by 6.1 percentage points, rising from 74.8% to 80.9%, and the precision also increasing by 2.3 percentage points, from 88% to 90.3%. In addition, the recall demonstrated an improvement of 7.3 percentage points, increasing from 64.0% to 71.3%, highlighting the YOLO v5s‒CDF model's remarkable enhancement in detecting small targets. For targets related to suspected illegal sand mining activities, the average precision increased from 90.2% to 92.2%; however, the precision experienced a decline from 88.7% to 86.7%, while the recall improved from 87.2% to 89.8%. Regarding green algae targets, the average precision rose from 82.8% to 86.9%, accompanied by an increase in precision from 81.0% to 84.0% and an improvement in recall from 77.9% to 81.3%. These improvements demonstrated the YOLO v5s‒CDF model's effectiveness in detecting several river issues across different object categories. The visual analysis of detection results further confirmed the superior performance of the YOLO v5s‒CDF model. When compared to the original YOLO v5s model, the improved model exhibited enhanced robustness and generalization capabilities in detecting river issues under complex environments, such as varied illumination conditions, water surface fluctuations, and reflections. The YOLO v5s‒CDF model successfully identified and localized a higher number of objects, particularly small and irregularly shaped floating garbage, which were often missed by the original model. The visual results aligned with the quantitative improvements observed in precision and recall. Ablation studies revealed the individual contributions of each introduced module, with the FocalNext module improving mAP by 1.8 percentage points, the Decoupled Head by 2.4 percentage points, and the combination of both modules by 3.7 percentage points. The addition of the Context Aggregation attention mechanism on top of the improvements from the FocalNext module and the Decoupled Head further improved the mAP by 4.1 percentage points compared to the original YOLO v5s model, while only slightly increasing the model's parameters and computational complexity. However, the model's performance in detecting submerged objects and illegal sand mining activities occurring at night required further improvement. Future research will focus on enhancing the model's applicability and accuracy under various environmental conditions, such as incorporating river topography changes as detection targets using 3D drone imagery and improving the collection of submerged object data. Conclusions The YOLO v5s‒CDF algorithm presents a viable technical approach to addressing challenges in river and lake supervision by integrating drone-based remote sensing technology with advanced image recognition methods. The enhanced model, which incorporates the FocalNext module, Context Aggregation attention mechanism, and Decoupled Head, demonstrates superior accuracy and robustness in detecting river debris, algal blooms, and potential illegal sand mining activities. It highlights the broad application potential of UAV remote sensing and target detection technologies in ecological environment management. In addition, integrating the YOLO v5s‒CDF algorithm with complementary technologies, such as water quality sensors, hydrological models, and geographic information systems (GIS), can provide a comprehensive framework for river health evaluation and management.
XuZongxue, MaXinyang.Assessment on river ecosystem health:A case study of Diannong River in Yinchuan City[J].Water Resources Development Research,2023,23(9):1‒7.
ZhouLuxi, QiuQianlinglin, TangJianfeng,et al.Characteristics of spring green algae blooms and their influencing factors in an urban lake,Moon Lake in Ningbo City,China[J].Journal of Lake Sciences,2019,31(4):1023‒1034.
PanKe.Research about Zhanghe River illegal sand mining management[D].Zhengzhou:North China University of Water Resources and Electric Power,2020.
[6]
潘科.漳河非法采砂治理研究[D].郑州:华北水利水电大学,2020.
[7]
WangHaijing, XuXiaohua, LiuYisheng,et al.Research on the effectiveness of sand mining and transportation supervision in rivers and lakes in Jiangxi Province[J].Technical Supervision in Water Resources,2023,31(4):9‒12.
ZhangYu.Design and implementation of intelligent river patrol system based on specific object detection[D].Nanchang:Nanchang Hangkong University,2022.
[10]
张宇.基于河道目标检测的智慧巡河系统设计与实现[D].南昌:南昌航空大学,2022.
[11]
YaoYalan.Awareness and reflections on strengthening smart regulation of rural water systems[J].Gansu Agriculture,2022(2):99‒102.
[12]
姚亚兰.加强农村水系智慧监管的认识与思考[J].甘肃农业,2022(2):99‒102.
[13]
ZuoJianjun, WuYoufu.An intelligence monitoring technique for floater surface of wate[J].Software Guide,2013,12(4):150‒152.
[14]
左建军,吴有富.水面漂浮物智能监控技术[J].软件导刊,2013,12(4):150‒152.
[15]
WangMin, ZhouShudao.Static water object detection segmentation[J].Research and Exploration in Laboratory,2010,29(6):30‒32.
[16]
王敏,周树道.静态水上物体检测分割算法[J].实验室研究与探索,2010,29(6):30‒32.
[17]
JiangJie, LiGang.Research on automatic monitoring methods for floating objects in rivers [J].Yellow River,2010,32(11):47‒48.
[18]
江杰,李刚.河流漂浮物的自动监测方法研究[J].人民黄河,2010,32(11):47‒48.
[19]
TangWei, LiuSiyang, GaoHan,et al.A target detection algorithm for surface cleaning robot based on machine vision[J].Science Technology and Engineering,2019,19(3):136‒141.
RedmonJ, DivvalaS, GirshickR,et al.You only look once:Unified,real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Las Vegas:IEEE,2016:779‒788. doi:10.1109/cvpr.2016.91
[22]
RedmonJ, FarhadiA.YOLO9000:Better,faster,stronger[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Honolulu:IEEE,2017:6517‒6525. doi:10.1109/cvpr.2017.690
GirshickR, DonahueJ, DarrellT,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition.Columbus:IEEE,2014:580‒587. doi:10.1109/cvpr.2014.81
[25]
GirshickR.Fast R-CNN[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV).Santiago:IEEE,2015:1440‒1448. doi:10.1109/iccv.2015.169
[26]
RenShaoqing, HeKaiming, GirshickR,et al.Faster R-CNN:Towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137‒1149. doi:10.1109/tpami.2016.2577031
[27]
TangXiaomin, ShuYuanzhong, LiuWenxiang,et al.Research on river floating object detection technology based on SSD deep network[J].Computer Technology and Development,2020,30(9):154‒158.
LiGuojin, YaoDongyi, AiJiaoyan,et al.Detection and localization of floating objects via improved faster R-CNN[J].Journal of Xinyang Normal University (Natural Science Edition),2021,34(2):292‒299.
LiDexin.Research and application of river floating garbage detection based on YOLOv5s[D].Xuzhou:China University of Mining and Technology,2021.
[32]
李德鑫.基于YOLOv5s的河道漂浮垃圾检测研究与应用[D].徐州:中国矿业大学,2021.
[33]
ZhouYuhao.Research on river image detection algorithm based on deep learning using UAV[D].Hangzhou:China University of Metrology,2021.
[34]
周宇浩.基于深度学习的无人机河道图像检测算法研究[D].杭州:中国计量大学,2021.
[35]
LiKe.Research on deep learning based algorithm for detection of tiny objects on water surface[D].Chongqing:Southwest University,2023.
[36]
李科.基于深度学习的水面小物体检测算法研究[D].重庆:西南大学,2023.
[37]
ChenRenfei, PengYong, WuJian,et al.Intelligent detection of floating objects on water surface based on deep learning[J].Advanced Engineering Sciences,2023,55(3):165‒174.
BaoXuecai, LiuFeiyan, NieJugen,et al.Research on segmentation method of multiple types of floating objects on water surface based on improved DeeplabV3+ [J].Water Resources and Hydropower Engineering,2024,55(4):163‒175.
LiuYe, LiHuifang, HuChao,et al.Learning to aggregate multi-scale context for instance segmentation in remote sensing images[J].IEEE Transactions on Neural Networks and Learning Systems,2025,36(1):595‒609. doi:10.1109/tnnls.2023.3336563
[43]
GeZheng, LiuSongtao, WangFeng,et al.YOLOX:Exceeding YOLO series in 2021[EB/OL].(2021‒08‒06)[2024‒01‒11].
[44]
EveringhamM, Van GoolL, WilliamsC K I,et al.The pascal visual object classes (VOC) challenge[J].International Journal of Computer Vision,2010,88(2):303‒338. doi:10.1007/s11263-009-0275-4
[45]
LiuZhuang, MaoHanzi, WuChaoyuan,et al.A ConvNet for the 2020s[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).New Orleans:IEEE,2022:11966‒11976. doi:10.1109/cvpr52688.2022.01167
[46]
WooS, DebnathS, HuRonghang,et al.ConvNeXt V2:Co-designing and scaling ConvNets with masked autoencoders[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Vancouver:IEEE,2023:16133‒16142. doi:10.1109/cvpr52729.2023.01548
[47]
ChenL C, ZhuYukun, PapandreouG,et al.Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Computer Vision‒ECCV 2018.Cham:Springer,2018:833‒851. doi:10.1007/978-3-030-01234-2_49
[48]
WangC Y, BochkovskiyA, LiaoH M.YOLOv7:Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Vancouver:IEEE,2023:7464‒7475. doi:10.1109/cvpr52729.2023.00721