Machine vision-based environmental perception technology is one of the key tasks in the field of intelligent transportation. Traditional deep learning algorithms typically meet the detection needs of individual targets in simple scenarios. However, they are not capable of addressing the intelligent perception requirements in complex traffic environment. To improve the intelligent perception capability of vehicles in such environment, this paper proposes an improved YOLOv8 object detection network model, integrating attention mechanisms, optimizers, and deformable convolutional layers to achieve multi-target detection in complex urban traffic environment. To verify the effectiveness of the algorithm, comparative experiment were conducted using YOLOv4, YOLOv8, and the improved YOLOv8 algorithm on sample images from complex traffic environments. The results show that, compared to YOLOv4 and YOLOv8, the improved YOLOv8 algorithm increased the average accuracy by 40.76% and 16.92%, respectively. The detection accuracy and real-time performance of the improved YOLOv8 algorithm meet the practical application requirements, and through multi-sensor information fusion, it can realize intelligent perception in complex urban traffic environment.
YOLO(you only look once)算法是一种基于深度学习回归方法的目标检测与分类算法.自Redmon等[7]提出初代模型以来,YOLO算法不断更新迭代并提升其性能,广泛应用于各类计算机视觉任务.YOLOv4通过对损失函数[8-9]、主干网络(backbone)和颈网络(neck)部分的优化,大幅提高了检测速度和精度[10-11].YOLOv5则通过自动化适配训练数据集,进一步提高了模型的训练速度和精度[12].YOLO算法现已迭代至第8代(YOLOv8),推出了更多优化和新功能,进一步提高了性能和灵活性,其结构如图1所示.
1.1 网络结构
YOLOv4采用CSPDarknet53作为骨干网络,在neck部分引入了SPP(spatial pyramid pooling)模块,并优化了检测过程中的损失函数与框筛选策略[13];YOLOv8的网络结构包括C2F(cross stage partial with two fusion)特征融合模块与SPPF(spatial pyramid pooling-fast)模块,进一步提升了特征提取和融合能力.Neck部分同样采用了C2F与CBS(Conv+BN+SiLU)模块来优化来自不同层的特征图的融合,检测头(head)部分则通过解耦合头分别进行分类与回归操作[14].
2) 引入Adam优化器.优化器(optimizer)是引导神经网络更新参数的工具,深度学习在计算出损失函数之后,需要利用优化器来进行反向传播,以此完成网络参数的更新,找到最优的模型参数,使得损失函数最小化.卷积神经网络中常用的优化方法包括随机梯度下降法(stochastic gradient descent,SGD)[19]、带有动量的随机梯度下降[20-21]、AdaGrad(adaptive gradient)[22]、RMSProp(root mean square propagation)和Adam(adaptive moment estimation)[23]优化器.本实验中,采用Adam优化算法替代迭代过程中存在一定随机性和震荡而导致准确度下降的SGD优化方法.Adam是将Momentum与RMSProp融合于一身的算法,引入了Momentum的一阶动量及RMSProp的二阶动量,以累计梯度、加快收敛速度、缩小波动幅度.在此基础上增加了2个修正项,能够实现参数自动更新[24].Adam优化算法除了计算梯度平方的指数衰减平均值,还计算梯度的指数衰减平均值.计算公式如下:
在目标检测中,交并比(intersection over union,IoU)通常用来衡量预测框与真实框(ground truth box,GTB)之间的匹配程度,其定义为目标的真实边界框与预测框交集与并集的比值[26],取值范围在[0,1]之间.为了判断预测结果的正确性,目标检测任务中会设定一个交并比的阈值,只有高于阈值的预测结果被视为有效预测.其计算如式(8)所示.本文中IoU取值0.5,即当IoU值大于0.5时,则预测框视为有效,保留预测框;反之则视为无效,不保留预测框.
引入平均精度均值mAP(mean average precision)、每秒帧数(frames per second,FPS)和准确率-召回率(precision-recall,P-R)曲线作为模型检测精度与实时性的评价指标,计算公式如下[27].
GaoDe-zhi, DuanJian-min, ZhengBang-gui, et al. Application status of intelligent vehicle environmental sensing sensor [J]. Modern Electronic Technology, 2008(19): 151-156.
[3]
WangK, GouC, ZhengN, et al. Parallel vision for perception and understanding of complex scenes: methods, framework, and perspectives[J]. Artificial Intelligence Review, 2017, 48(3): 299-329.
[4]
AufrèreR, GowdyJ, MertzC, et al. Perception for collision avoidance and autonomous driving [J]. Mechatronics, 2003, 13(10): 1149-1161.
[5]
MontemerloM, BeckerJ, BhatS, et al. Junior: the Stanford entry in the urban challenge [J]. Journal of Field Robotics, 2008, 25(9): 569-597.
[6]
LeonardJ, HowJ, TellerS, et al. A perception-driven autonomous urban vehicle[J]. Journal of Field Robotics, 2008, 25(10): 727-774.
XieZhi-ping, LeiLi-ping. Development and research status of intelligent networked automotive environment awareness technology[J]. Journal of Chengdu Technological University, 2016, 19(4): 87-92.
[9]
RedmonJ, DivvalaS, GirshickR, et al. You only look once: unified, real-time object detection[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, 2016:779-788.
[10]
AsmaaB, KhalidZ. Optimizing CNN-BiGRU performance: mish activation and comparative analysis[J]. International Journal of Computer Networks & Communications, 2024, 16(3): 69-87.
[11]
WangY F, HuaC C, DingW L, et al. Real-time detection of flame and smoke using an improved YOLOv4 network[J]. Signal, Image and Video Processing, 2022, 16(4): 1-8.
GuoZhen-yu, GaoGuo-fei. Research on detection algorithm of mixed traffic between people and vehicles at complex intersections based on YOLO v4 [J]. Information Technology and Informatization, 2021(2): 236-240.
[18]
ŁysakowskiM, ŻywanowskiK, BanaszczykA, et al. Real-time onboard object detection for augmented reality: enhancing head-mounted display with YOLOv8[C]//IEEE International Conference on Edge Computing and Communications. Chicago, 2023:364-371.
[19]
VaswaniA, ShazeerN, ParmarN, et al. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30: 6000-6010.
LiHong, ZouJun-ying, TanQian-cheng, et al. Multi-attention fusion network for medical image segmentation [J]. Journal of Computer Applications, 2022, 42(12): 3891-3899.
VasanthiP, MohanL. A reliable anchor regenerative-based transformer model for x-small and dense objects recognition[J]. Neural Networks, 2023, 165: 809-829.
[24]
BattitiR. First- and second-order methods for learning: between steepest descent and Newton's method[J]. Neural Computation, 2014, 4(2): 141-166.
[25]
SutskeverI, MartensJ, DahlG, et al. On the importance of initialization and momentum in deep learning[C]// International Conference on Machine Learning (ICML). Atlanta, 2013: 1139-1147.
[26]
QianN. On the momentum term in gradient descent learning algorithms[J]. Neural Networks, 1999, 12(1): 145-151.
[27]
DuchiJ, HazanE, SingerY. Adaptive subgradient methods for online learning and stochastic optimization[J]. Journal of Machine Learning Research, 2011, 12: 2121-2159.
[28]
MohamedR, AmanyM S. A modified Adam algorithm for deep neural network optimization[J]. Neural Computing and Applications, 2023, 35(23): 17095-17112.
[29]
SarkerI H. Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions[J]. SN Computer Science, 2021, 2(6): 420-420.
[30]
DaiJ F, QiH Z, XiongY W, et al. Deformable convolutional networks[C]// Proceedings of the IEEE International Conference on Computer Vision (ICCV). Venice, 2017: 764-773.