To address the low detection accuracy of existing methods under complex wood texture conditions, this paper proposes an improved YOLOv8s-based approach for wood defect detection. First, an efficient multi-scale attention (EMA) mechanism is embedded into the backbone network to enhance the model’s contextual perception capability in complex texture scenarios. Second, the neck network is redesigned as a re-parameterized generalized feature pyramid network (RepGFPN) to strengthen cross-scale feature fusion. Third, the loss function is replaced with SCYLLA-IoU (SIoU) to improve bounding box regression precision. Finally, the inverted residual mobile block (iRMB) is integrated into the C2f module, improving the model’s ability to capture fine-grained defects. Experimental results demonstrate that the proposed method outperforms the baseline by 5.09% in precision, 3.13% in recall, 3.72% in mAP@0.5, and 2.63% in mAP@0.5:0.95, while achieving a real-time inference speed of 120 frames per second. These findings indicate that the proposed enhancements significantly improve the model’s robustness and generalization capability, leading to superior and more stable performance in complex wood defect detection tasks.
近年来,随着深度学习技术的快速发展,基于卷积神经网络的目标检测方法具备良好的特征表达能力和模式识别能力,在工业缺陷检测等领域展现出显著优势,为缺陷检测相关问题提供了新的解决思路。孙丽萍等[2]提出了一种基于改进YOLOv5的林业有害生物检测方法,但检测精度提升不到1个百分点。肖维颖等[3]针对松树株数识别问题,提出了一种基于YOLOv5的轻量化算法,虽然模型检测速度提高了,但准确率却下降了3.26个百分点。刘康康等[4]结合不同机器学习方法研究高光谱影像树种分类问题,但最终的检测精度仍然较低,该方法无法适用于高精度场景任务。Xie等[5]针对木材中3类典型缺陷如节疤、腐烂和凹陷,开展了基于计算机断层扫描(Computerized tomography,CT)图像的缺陷检测与分割研究。其工作评估了5种主流卷积神经网络模型的性能,结果表明YOLOv8-seg模型在整体性能方面表现最为均衡,但在分割精度方面仍不及U-Net,尚存在进一步提升空间。Liu等[6]为解决传统木材缺陷检测方法存在的识别能力不足与检测效率低等问题,提出了一种新型目标检测模型(YOLO-Based defect detection model for wood lumber,SGM-YOLO)。该模型引入了新型主干网络SL-Backbone(SE,LSKA-SPPF),并设计了GVE(Group Shuffle Convolution(GSConv) convolution and VoV-GSCSP Modules,EMA)颈部结构模块。试验结果表明,SGM-YOLO的平均识别准确率达77.4%,较原始YOLOv8提升了3.8个百分点,但在高精度要求场景中的适应性仍有限。为应对木材缺陷类型多样且分布复杂的挑战,Xi等[7]提出了SiM-YOLO(A WoodSurface Defect Detection Method Based on the Improved YOLOv8)检测模型,该模型引入了一种细粒度卷积结构SPD-Conv(SPD-Conv comprises a space-to-septh(SPD) layer followed by a non-stride convolution(Conv) layer),并构建了基于SiAFF-PANet的多尺度特征融合模块,同时设计了多注意力机制检测头,试验表明,SiM-YOLO相较YOLOv8在检测精度上提高了4.3%,但在复杂背景下仍存在一定的误检与漏检问题。An等[8]则致力于研发适用于自动化生产线的木材缺陷检测方案,将CondConv、Wise-IoU与BiFormer模块集成到YOLOv8中,所提出的模型在mAP@0.5和mAP@0.5:0.95指标上分别提升了3.5%和5.8%,但在面对复杂纹理背景时检测鲁棒性仍显不足。Meng等[9]针对木材缺陷普遍具有体积小、形状复杂等特性提出了SGN-YOLO(Semi-global network-YOLO)检测模型,该方法在主干网络中引入轻量化半全局建模(Semi-global network,SGN)模块以增强上下文感知能力,并融合了扩展高效层聚合结构(Extended efficient layer aggregation networks,E-ELAN)。此外,采用了EIoU(Efficient intersection over union)损失函数,以缓解模型收敛速度慢、边界回归不准确等问题。试验结果显示,该模型的mAP达到86.4%,相比基准模型提升了3.1%,但在小目标缺陷检测方面仍存在漏检风险。
为了解释模型决策依据及作用机制,验证注意力机制在改善模型性能方面的有效性,下面利用Grad-CAM(gradient-weighted class activation mapping)技术[23]对木材缺陷检测结果进行可视化,如图6所示。由图6可以看出,基准模型的注意区域较为分散,且部分高响应区域未能覆盖图像关键缺陷位置,说明基准模型对缺陷区域关注度不足。在引入EMA注意力机制后,模型的热力响应更加集中,在检测活节缺陷时,模型能够有效提取活节关键语义信息,提升了局部结构的表征能力;在检测死节缺陷时,EMA注意力机制能够增强死节核心区域的热力响应,表明其擅长捕捉高对比度缺陷信息;在检测裂缝缺陷时,EMA注意力机制能够保留裂缝的纵向结构信息,有助于长条状缺陷的捕捉;在检测树髓缺陷时,EMA注意力机制能够有效感知木材放射结构中心区域,增强了对异质区域的检测识别能力。试验表明,EMA注意力机制拥有全局感受野建模优势,更加适用于需要全局建模的复杂纹理检测任务。
DINGA N, HEC G, DUOH Q,et al.Research review of wood defect recognition based on digital images[J].Chinese Journal of Wood Science and Technology,2022,36(1):9-16,28.
LIUK K, ZHONGH, LINW S.Tree species classification in UAV hyperspectral images based on different machine learning algorithms[J].Forest Engineering,2024,40(4):98-108.
[9]
XIEG Q, WANGL H, WILLIAMSR A,et al.Segmentation of wood CT images for internal defects detection based on CNN:a comparative study[J].Computers and Electronics in Agriculture,2024,224:109244.
[10]
LIUL P, ZHANGQ Y, PENGW Q,et al.SGM-YOLO:YOLO-based defect detection model for wood lumber[J].International Journal of Pattern Recognition and Artificial Intelligence,2024,38(15):2455012.
[11]
XIH L, WANGR J, LIANGF L,et al.SiM-YOLO:A wood surface defect detection method based on the improved YOLOv8[J].Coatings,2024,14(8):1001.
[12]
ANH, LIANGZ H, QINM M,et al.Wood defect detection based on the CWB-YOLOv8 algorithm[J].Journal of Wood Science,2024,70(1):26.
[13]
MENGW, YUANY L.SGN-YOLO:Detecting wood defects with improved YOLOv5 based on semi-global network[J].Sensors,2023,23(21):8705.
GIRSHICKR, DONAHUEJ, DARRELLT,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition.June 23-28,2014.Columbus,OH,USA.IEEE,2014:580-587.
[16]
GIRSHICKR.Fast R-CNN[C]//2015 IEEE International Conference on Computer Vision (ICCV).December 7-13,2015.Santiago,Chile.IEEE,2015:1440-1448.
[17]
RENS Q, HEK M, GIRSHICKR,et al.Faster R-CNN:Towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,39(6):1137-1149.
[18]
HEK M, GKIOXARIG, DOLLÁRP,et al.Mask R-CNN[C]//2017 IEEE International Conference on Computer Vision (ICCV).October 22-29,2017.Venice,Italy.IEEE,2017:2961-2969.
[19]
OUYANGD L, HES, ZHANGG Z,et al.Efficient multi-scale attention module with cross-spatial learning[C]//ICASSP 2023-2023 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).June 4-10,2023.Rhodes Island,Greece.IEEE,2023:1-5.
GEVORGYANZ.SIoU loss:More powerful learning for bounding box regression[J].arXiv preprint arXiv:2022.
[22]
ZHANGJ N, LIX T, LIJ,et al.Rethinking mobile block for efficient attention-based models[C]//2023 IEEE/CVF International Conference on Computer Vision(ICCV).October 1-6,2023.Paris,France.IEEE,2023:1389-1400.
[23]
HUJ, SHENL, SUNG.Squeeze-and-excitation networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.June 18-23,2018.Salt Lake City,UT,USA.IEEE,2018:7132-7141.
[24]
WANGQ L, WUB G, ZHUP F,et al.ECA-net:Efficient channel attention for deep convolutional neural networks[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).June 13-19,2020.Seattle,WA,USA.IEEE,2020:11534-11542.
[25]
HOUQ B, ZHOUD Q, FENGJ S.Coordinate attention for efficient mobile network design[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).June 20-25,2021.Nashville,TN,USA.IEEE,2021:13713-13722.
SELVARAJUR R, COGSWELLM,DAS A,et al.Grad-CAM:Visual explanations from deep networks via gradient-based localization[C]//2017 IEEE International Conference on Computer Vision.October 22-29,2017.Venice,Italy.IEEE,2017:618-626.
[28]
LINT Y, DOLLÁRP, GIRSHICKR,et al.Feature pyramid networks for object detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).July 21-26,2017.Honolulu,HI,USA.IEEE,2017:2117-2125.
[29]
LIUS, QIL, QINH F,et al.Path aggregation network for instance segmentation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.June 18-23,2018.Salt Lake City,UT,USA.IEEE,2018:8759-8768.
[30]
TANM X, PANGR M, LEQ V.EfficientDet:Scalable and efficient object detection[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).June 13-19,2020.Seattle,WA,USA.IEEE,2020:10781-10790.
[31]
CHENY F, ZHANGC Y, CHENB,et al.Accurate leukocyte detection based on deformable-DETR and multi-level feature fusion for aiding diagnosis of blood diseases[J].Computers in Biology and Medicine,2024,170:107917.
[32]
ZHENGZ H, WANGP, LIUW,et al.Distance-IoU loss:Faster and better learning for bounding box regression[J].Proceedings of the AAAI Conference on Artificial Intelligence,2020,34(7):12993-13000.
[33]
ZHANGY F, RENW Q, ZHANGZ,et al.Focal and efficient IOU loss for accurate bounding box regression [J].Neurocomputing,2022,506:146-157.
[34]
LIJ F, WENY, HEL H.SCConv:spatial and channel reconstruction convolution for feature redundancy[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).June 17-24,2023.Vancouver,BC,Canada.IEEE,2023:6153-6162.