In response to the issues of low performance, missed detection, and false detection in UAV small target detection, an improved YOLOv8 - based model, YOLOv8 - CAS, is proposed. By constructing a lightweight C2f - CAS feature extraction module, the model enhances its ability to capture small target details while reducing computational complexity. The SimAM attention mechanism is introduced to adaptively select features, thereby enhancing the model′s focus on key areas and improving small target detection accuracy. The DyHead detection head is incorporated, which uses a dynamic routing mechanism to effectively fuse multi - scale features and improve the model′s small target localization accuracy. Finally, a P2 small target detection layer is added, which optimizes multi - scale feature fusion by enhancing the utilization of shallow high - resolution feature maps. Experimental results on the VisDrone dataset show that the improved model achieves increases of 10.1, 8.1, 10.2, and 6.7 percentage points in precision, recall, mAP@50, and mAP@50∶95, respectively, compared to the original model. The results demonstrate that the YOLOv8 - CAS model shows significant performance improvements in small target detection tasks and is suitable for UAV target detection applications in complex scenarios.
在目标检测领域,以Faster R - CNN[4]为代表的2阶段模型通过区域建议网络生成候选框,然后再对其进行精细化分类与回归.但是其复杂的双阶段流程无法满足无人机实时检测的需求[5].相比之下,单阶段模型凭借端到端架构实现了检测效率与精度的有效平衡.其中YOLO(you only look once)系列模型通过持续迭代已成为工业界主流方案.该系列从YOLOv3引入多尺度预测机制[6],到YOLOv5优化特征金字塔结构[7],直至YOLOv8采用C2f模块强化梯度传播[8],逐步提升了模型性能.然而在无人机小目标检测任务中,YOLOv8仍存在显著局限:首先,深层特征提取模块(C2f)对小目标的细节捕捉能力不足,导致细微特征难以有效保留;其次,传统注意力机制(如SE[9]、CBAM[10])因较高计算复杂度,在轻量化模型部署中面临效率瓶颈;此外,模型对浅层高分辨率特征的利用尚不充分,致使小目标在下采样过程中容易出现特征退化现象.
准确率(rrecision)是指模型预测为正类的样本中,实际为正类的比例;召回率(recall)则反映了模型识别出所有真实正类样本的能力.mAP(mean average precision)表示所有类别平均精度(AP)的平均值.其中,mAP@50表示在交并比(IoU)阈值设为0.5时计算得到的平均精度均值.上述各指标的具体数学表达式如公式(10)、(11)和(12)所示:
ZEGGADAA, MELGANIF, BAZIY. A deep learning approach to UAV image multilabeling[J]. IEEE Geoscience and Remote Sensing Letters, 2017,14(5):694 - 698.
[3]
SUNK, LID, SONGY. Improved multi - scale small target detection by UAV[J]. Multimedia Tools and Applications, 2025,84(22):24789 - 24803.
[4]
RENS, HEK, GIRSHICKR, et al. Faster R - CNN: towards real - time object detection with region proposal networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2016,39(6):1137 - 1149.
[5]
CHENGB, WEIY, SHIH, et al. Revisiting rcnn: on awakening the classification power of faster rcnn[C]//Proceedings of the European conference on computer vision (ECCV), 2018(v1): 453 - 468.
[6]
ZHAOL, LIS. Object detection algorithm based on improved YOLOv3[J]. Electronics, 2020,9(3):537.
[7]
KIMJ H, KIMN, PARKY W, et al. Object detection and classification based on YOLO - V5 with improved maritime dataset[J]. Journal of Marine Science and Engineering, 2022, 10(3):377.
[8]
WANGG, CHENY, ANP, et al. UAV - YOLOv8: a small - object - detection model based on improved YOLOv8 for UAV aerial photography scenarios[J]. Sensors, 2023, 23(16):7190.
[9]
HUJ, SHENL, SUNG. Squeeze - and - excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2018: 7132 - 7141.
[10]
WOO S, PARKJ, LEEJ Y, et al. Cbam: convolutional block attention module[C]//Proceedings of the European conference on computer vision (ECCV), 2018:3 - 19.
ZHANGT, LIL, ZHOUY, et al. Cas - vit: convolutional additive self - attention vision transformers for efficient mobile applications[J]. arXiv preprint arXiv:2024.
[17]
YANGL, ZHANGR Y, LIL, et al. Simam: a simple, parameter - free attention module for convolutional neural networks[C]//International conference on machine learning. PMLR, 2021:11863 - 11874.
[18]
DAIX, CHENY, XIAOB, et al. Dynamic head: unifying object detection heads with attentions[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021:7373 - 7382.
[19]
ZHUP, WENL, DUD, et al. Detection and tracking meet drones challenge[J]. IEEE transactions on pattern analysis and machine intelligence, 2021, 44(11):7380 - 7399.
DAIX, CHENY, XIAOB, et al. Dynamic head: unifying object detection heads with attentions[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021:7373 - 7382.