Autonomous driving technology plays a crucial role in the development of smart mines,with its primary challenge being the safe navigation of vehicles within the intricate and dynamic environments of open-pit mines.Mining roads are frequently characterized by a high density of diverse obstacles,including rockslides,water pits,and ruts,which present in various forms and are widely dispersed.These conditions pose substantial safety risks to the autonomous operation of mining vehicles.At present,although numerous road obstacle detection algorithms have been proposed,their detection accuracy is frequently constrained by the distinctive conditions present in open-pit mines,thereby hindering their ability to satisfy practical application requirements.This study presents a road obstacle detection algorithm for open-pit mines based on RT-DETR.The algorithm integrates the RepViT network within the encoder phase to augment the model’s feature extraction capabilities,thereby facilitating a more precise capture of the characteristic information of road obstacles.In the decoder section,the algorithm employs channel compression pruning techniques,which significantly decrease the model’s computational complexity and enhance detection speed.Furthermore,it incorporates the RepAttC3 module,augmented with an attention mechanism,thereby enhancing the model’s capability to detect multi-scale and small target obstacles.To evaluate the algorithm’s efficacy,a dataset comprising road obstacle images from various mines,seasons,and scenarios was assembled,specifically focusing on open-pit mine road obstacles.The experimental findings indicate that the algorithm exhibits superior performance in identifying road obstacles within open-pit mines,achieving an average detection accuracy of 92.7%,a comprehensive detection accuracy of 96.6%,and a detection speed of 12.3 milliseconds.In comparison to existing road obstacle detection algorithms,the proposed algorithm demonstrates distinct advantages in detecting multi-scale and small target obstacles,thereby offering more precise and efficient obstacle detection for vehicles operating in open-pit mining environments.It offers robust technical support for the development of autonomous driving technology in open-pit mines,further advancing the progress of smart mine construction.
露天矿区路面障碍物检测方法主要有传统图像处理方法、三维点云分析方法和机器学习方法。其中,传统图像处理方法利用RGB或热红外图像进行局部强度分析、色域变换和膨胀腐蚀等,从而实现障碍物的分割或检测(Ryu et al.,2015;Gao et al.,2020),但是该方法高度依赖特征设计,其泛化性能较差,并不适用于露天矿区复杂多变环境。三维点云分析方法通过分析点云数据平面特征、局部凸性特征和局部密集特征等,完成路面障碍物分割(刘家银等,2017;汪佩等,2017),但是该方法受限于点云密度,无法对路面小尺寸障碍物完成检测,且检测速度较慢。近年来,众多学者采用机器学习方法对路面障碍物进行检测,这些网络模型通过训练大量数据,能够自动学习图像中的特征信息,并准确识别出障碍物的类型。如:利用位置感知卷积神经网络,进行结构化道路坑洼检测(Chen et al.,2020),但算法精度较差。而基于YOLO目标检测网络(Wang et al.,2022;何铁军等,2024),实现了结构化路面坑洼高精度检测。同时,针对露天矿区负向障碍物检测多尺度问题,利用多尺度融合的路面负障碍检测算法(阮顺领等,2021),实现了矿区路面车辙水坑的精确检测。对于矿区路面落石检测,部分学者提出了基于加权双向特征融合的矿区道路落石检测方法(顾清华等,2023),实时检测道路落石。由于落石与车辙、水坑之间的尺寸差异极大,因此,对这3类障碍物同时进行检测的方法鲜有报道。基于深度学习检测算法在露天矿区路面障碍物性能良好,依照其实现原理不同可划分为一阶段算法和两阶段算法。其中,一阶段算法如YOLO系列(Bochkovskiy et al.,2020)、SSD(Liu et al.,2016)和RetinaNet(Lin et al.,2017)等,优点在于速度快,适用于实时检测,但对于小目标或密集目标的检测效果不佳。两阶段算法如Faster-RCNN(Girshick et al.,2015)和Cascade-RCNN(Cai et al.,2018)等,具有良好的小目标检测性能,但速度较慢。虽然上述算法的检测速度和精度良好,但是算法中包含阈值筛选和非极大值抑制2个关键步骤,会导致模型的稳健性和检测速度降低(Carion et al.,2020)。同时,算法在部署阶段往往需要占用大量的后处理时间,来解析密集的检测框。然而,Carion et al.(2020)提出的DETR算法,利用 Transformer 技术将目标检测重新定义为集合预测问题,采用端到端可训练的编码器—解码器结构,替代传统基于区域提案的方法,省去了提案生成和后处理步骤。但是,DETR算法存在训练耗时过久及推理速度较慢等缺点,许多学者对DETR算法进行了注意力机制的改进(Zhu et al.,2020),仍无法达到实时检测的目的。RT-DETR算法在DETR算法基础上进行深度优化,改进了编码解码结构(Zhao et al.,2024),实现了精度与速度之间的平衡,与YOLO系列算法相比,RT-DETR算法在小目标与多尺度检测方面展现出明显的优势。
DETR(Detection Transformer)是一种基于Transformer架构(Vaswani et al.,2017)的端到端目标检测算法,将目标检测任务转化为一个序列到序列的问题,通过Transformer网络同时进行目标的检测和分类。如图1(b)所示,端到端检测算法直接输出了目标的检测框,而在传统检测算法中,一个目标会输出多个检测框,需要使用置信度过滤和非极大抑制算法,剔除多余检测框[图1(a)]。因此,端到端检测算法避免了传统目标检测方法中需要设计复杂的手工特征和后处理步骤的问题,直接输出目标的检测结果。
RT-DETR由主干网络、混合编码器和带有辅助预测头的解码器组成。主干网络使用 ResNet50网络(He et al.,2016)进行特征提取,并将最后3个阶段的输出特征作为编码器的输入。混合编码器由 AIFI模块(Zhu et al.,2020)和CCFM模块组成,AIFI 模块对最深层特征进行编码,CCFM模块通过自底向上和自顶向下2条路径的特征融合,将多尺度特征转换为一系列图像特征。解码器首先通过IoU感知查询模块,从编码器输出序列中选择固定数量的图像特征作为初始对象查询,然后通过迭代优化来生成预测框和置信度分数。但是,RT-DETR算法参数量和计算量过大,难以部署在低算力的边缘设备中,限制了其在露天矿区车辆防碰撞和无人驾驶等场景的应用。由于RT-DETR有X和L共2个版本,L版本兼顾了精度和推理速度,因此,本文基于RT-DETR-L,提出了露天矿区障碍物检测算法,其结构如图2所示。
由于矿区路面的车辙和水坑面积大,目标边缘不清晰,且与道路高度融合,目标识别难度加大。相比传统的卷积骨干网络,RepViT具有更大的全局感受野(Wang et al.,2024),并进一步增强了局部特征提取能力。因此,在特征提取阶段,使用RepViT重参化Transformer骨干网络,提升了图像的特征提取效率。由于道路中落石在整幅图像中平均像素尺寸占比小且特征不明显,利用大尺寸卷积核进行卷积,反而会导致细粒度信息丢失和特征学习效率降低。
因此,受RepViT启发,提出了RepAttC3特征提取模块,利用3×3和1×1卷积同时获取局部特征和细粒度特征,以减少特征提取中的信息丢失问题,增强对低分辨率小目标的识别效果。同时,在混合编码器中,优化了算法的宽度和深度,进一步减少参数量,提升了算法的推理速度。由于车辙与道路的融合范围大,传统的边界框回归损失函数在算法训练过程中面临着优化难题,特别是在预测框与真实框具有相同宽高比但具体尺寸不同的情况。这将导致车辙尺寸预测不准确时,算法难以收敛,因此将MPDIoU损失函数(Ma et al.,2023)引入算法,通过最小化预测框与真实框之间的关键点距离,综合考虑重叠区域、中心点距离以及宽度和高度的偏差,从而更准确地反映预测框与真实框之间的差异。
由于车辙与道路融合范围较大,传统的边界框回归损失函数在算法训练过程中面临着一些困难。当预测框和真实框在宽高比相同但具体尺寸不同时,算法很难优化这种情况。传统的边界框回归损失函数无法有效处理车辙尺寸预测的不准确性,导致算法很难收敛。因为车辙可能出现在道路上的任何位置和尺寸,算法难以找到一个合适的尺寸来匹配真实的车辙,进一步影响了算法的训练效果。原有的CIOU损失函数(Zheng et al.,2020),通过增加尺度和纵横比损失来改进边界框预测。然而,CIOU损失函数仍存在一些固有的局限性,包括纵横比描述的模糊性,并忽视了样本难易度的平衡。MPDIoU损失函数引入了尺度和纵横比损失,以改进边界框预测。MPDIoU损失函数结合最小点距离的概念,通过最小化预测框和真实框之间左上角和右下角点的距离,来提高回归效率和准确性。计算公式为
如表1所示,共采用6种数据增强方法,由于不同数据增强方法的实现原理不同,故其对应不同的参数。由于每批次数据训练时所使用的增强方法是随机的,为不同的增强方法赋予不同的概率值,能够更好地保护原始数据特征,并增强数据集泛化能力。Mosaic数据增强划分为Mosaic_4与Mosaic_9(Bochkovskiy et al.,2020),分别表示用4张或9张图像生成一张新的图像数据。鉴于本文落石样本较小,且露天矿区道路环境复杂,因此本文采用Mosaic_4作为数据增强方法,其实现原理如下:
BochkovskiyA, WangC Y, LiaoH Y,2000.Yolov4:Optimal speed and accuracy of object detection[J].arXiv:
[2]
CaiZ W, VasconcelosN,2018.Cascade R-CNN:Delving into high quality object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake:IEEE.
[3]
CarionN, MassaF, SynnaeveG,et al,2020.End-to-end object detection with transformers[C]//European Conference on Computer Vision.Cham:Springer.
[4]
ChenH S, YaoM H, GuQ L,2020.Pothole detection using location-aware convolutional neural networks[J].International Journal of Machine Learning and Cybernetics,11(4):899-911.
[5]
GaoM X, WangX, ZhuS L,2020.Detection and segmentation of cement concrete pavement pothole based on image processing technology[J].Mathematical Problems in Engineering,2020:1360832.
[6]
GirshickR,2015.Fast R-CNN[C]//2015 IEEE International Conference on Computer Vision.Santiago:IEEE.
[7]
GuQinghua, DuYifan, LiPingfeng,et al,2023.Rockfall detection on mining area roads based on weighted bidirectional feature fusion[J].Gold Science and Technology,31(6):953-963.
[8]
HeK M, ZhangX Y, RenS Q,et al,2016.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE.
[9]
HeTiejun, LiHuaen,2024.Pavement distress detection model based on improved YOLOv5[J].Journal of Civil Engineering,57(2):96-106.
[10]
HuJ, ShenL, SunG,2018.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake:IEEE.
[11]
LinT Y, GoyalP, DollárP,2017.Focal loss for dense object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Venice:IEEE.
[12]
LiuJiayin, TangZhenmin, WangAndong,et al,2017.Obstacle detection in unstructured environments based on multi-lidar and combined features[J].Robotics,39(5):638-651.
MaS L, XuY,2023.Mpdiou:A loss for efficient and accurate bounding box regression[J].arXiv:
[15]
RuanShunling, LiShaobo, LuCaiwu,et al,2021.Negative obstacle detection on open-pit mining area roads based on multi-scale feature fusion[J].Journal of Coal Science and Engineering,46(Supp.2):1170-1179.
[16]
RyuS K, KimT, KimY R,2015.Feature-based pothole detection in two dimensional images[J].Transportation Research Record:Journal of the Transportation Research Board,2528(1):9-17.
[17]
VaswaniA, ShazeerN, ParmarN,et al,2017.Attention is all you need[J].Advances in Neural Information Processing Systems,(30):955-964.
[18]
WangA, ChenH, LinZ J,et al,2024.RepVIT:Revisiting mobile CNN from VIT perspective[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Seattle:IEEE.
[19]
WangD Y, LiuZ, GuX Y,et al,2022.Automatic detection of pothole distress in asphalt pavement using improved convolutional neural networks[J].Remote Sensing,14(16):3892.
[20]
WangPei, GuoJianhui, LiLunbo,et al,2017.Negative obstacle detection algorithm based on single-line lidar and vision fusion[J].Computer Engineering,34(7):303-308.
[21]
ZhaoY A, LüW Y, XuS L,et al,2024.DETR beats YOLOs on real-time object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Seattle:IEEE.
[22]
ZhengZ H, WangP, LiuW,et al,2020.Distance-IoU loss:Faster and better learning for bounding box regression[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Palo Alto,California:AAI Press.
[23]
ZhuX Z, SuW J, LuL W,et al,2020.Deformable detr:Deformable transformers for end-to-end object detection[J].arXiv: