The perception of surrounding objects by vehicle autonomous driving is an important means to ensure traffic safety. Object detection model with deep learning is adopted widely, but they requires a large amount of annotated data for training. In this paper, an active vision object detection model is proposed using Gaussian mixture distribution to estimate the uncertainty of unlabeled images, reduces the dependence of model training on labeled data. Firstly, the mixed density network is adopted as the detection head, taking the image feature extracted by the deep neural network as input, estimates the probability distribution of classification and location of the object predicted boxes. Secondly, the classification score of the object predicted boxes is mapped into the probability space, and the classification uncertainty of the object is calculated by edge uncertainty; the location variance of the predicted boxes is used to measure the location uncertainty of object. Finally, the most unstable samples were selected for labeling. The results on the VOC dataset show that compared with other typical active learning sampling strategies, the proposed model achieved the best performance. The proposed model using only 54% of the data annotation volume can achieve the 98.8% performance of YOLOX with supervised learning, which saves up nearly 45% of the data annotation volume.
不确定度可以度量未标注样本包含的信息量,不确定度越高的样本对模型的优化越有效。目前,多数主动目标检测(Active object detection, AOD)算法仅考虑图像的分类不确定度。例如,文献[4]以SSD网络作为主动学习的基础分类器,然后构建分类器委员会(committee of classifiers),将类别输出差异最大的样本判定为信息量丰富的样本;Gal等[5]通过估计图像分类网络权值的先验分布,采用dropout采样权值并预测变异率,将变异率高的图像视为不确定度高的图像,然后计算未标记图像的信息量得分,该方法的主要缺点是没有考虑所选样本间的相似性,导致样本信息冗余;Elhamifar等[6]考虑所选样本在特征空间中的相似性及其信息量评分,在凸优化框架内实现主动学习;Paul等[7]通过引入核心集减少样本数据的信息冗余;张新生等[8]采用基于Bootstrap的主动学习方法选择未标注样本数据并训练分类器;Lakshminarayanan等[9]尝试将主动学习应用于目标检测的回归任务;Kao等[10]利用定位紧密度和定位稳定度对未标注图像进行排序,前者测量边界框的紧密程度,后者评估边界框在原始图像和有噪声图像中的稳定性。上述主动学习方法中,部分仅在主动视觉目标检测中考虑分类不确定度[5-8,11],部分则专注于单一的回归不确定性[9]。文献[10]虽然利用预测框(RPN候选框)和真实框的交并比估计不确定度,但是没有估计交并比的概率分布。
主动学习方法虽然能降低数据的标注成本,但在处理高维数据时其性能并不理想。深度学习与主动学习都是机器学习的重要分支,为了综合二者的优势,有必要探索深度主动学习方法。然而,找到具有不确定性和代表性的未标记样本并非易事[2]。针对上述文献中所提方法的不足,本文在分类不确定性的基础上,提出了融合定位不确定性采样的深度主动视觉目标检测模型,并将其命名为高斯YOLOX(Gaussian YOLOX)。本文的主要工作包括:①改进网络结构,使神经网络的输出层可预测概率分布;②改进样本不确定性的度量方法,除分类不确定度外,还采用目标框位置分布的方差信息度量位置不确定度;③在目标检测模型的损失函数中增加定位不确定性损失,采用随机梯度下降法优化目标框位置的定位概率分布;④采用单一的检测网络和单次前向推理估计图像不确定度。与基于查询委员会(Query by committee,QBC)的主动学习方法[4]相比,本文方法显著降低了计算成本和模型复杂度。
与基于深度学习的YOLOv3、YOLOv4和YOLOv5版本相比,YOLOX[14]不再采用锚点方式搜索目标框,既降低了计算量,又缓解了正负样本不平衡的问题。本文在YOLOX的基础上,改进了目标检测模型,其网络结构如图2所示。采用混合密度网络(Mixture density network,MDN)[15]替换YOLOX的检测头(detection head),网络输出由高斯混合模型的参数组成,即均值、方差和权重,利用这些参数估计未标注图像的不确定性。MDN包括3个分支,即定位分支、分类分支和置信度分支,每个分支分别输出服从高斯混合分布的K组卷积核参数,即和 (k=1,2,…,K)。其中,为高斯分布的权重估计值,为预测框中心和宽高参数的估计值,为高斯分布的方差估计值。此外,主干特征提取网络用于获取图像中不同大小感受野的特征图,特征融合金字塔网络用于融合不同大小感受野的特征图。
基于高斯混合分布拟合MDN网络定位分支的输出,并以交并比(Intersection over union, IOU)作为高斯混合分布的均值参数。相应地,MDN网络的回归分支输出中另外两个输出作为高斯混合分布的权重和方差参数。负对数似然损失函数适用于概率建模和多类别分类问题,因此本文采用负对数似然损失函数设计回归损失函数,具体形式为:
QuYou, LiWen-hui. Single-stage rotated object detection network based on anchor transformation[J]. Journal of Jilin University (Engineering and Technology Edition), 2022, 52(1): 162-173.
WuWei-ning, LiuYang, GuoMao-zu, et al. Advances in active learning algorithms based on sampling strategy[J]. Journal of Computer Research and Development, 2012, 49(6): 1162-1173.
ElhamifarE, SapiroG, YangA, et al. A convex optimization framework for active learning[C]∥Proceedings of the IEEE International Conference on Computer Vision. Piscataway, NJ: IEEE, 2013: 209-216.
[10]
PaulR, FeldmanD, RusD, et al. Visual precis generation using coresets[C]∥Proceedings of the 2014 IEEE International Conference on Robotics and Automation. Piscataway, NJ: IEEE, 2014: 1304-1311.
ZhangXin-sheng, GaoXin-bo, WangYing, et al. New method for microcalcification clusteres detection using active learning in the mammogram[J]. Journal of Xidian University, 2008(5): 871-877.
[13]
LakshminarayananB, PritzelA, BlundellC. Simple and scalable predictive uncertainty estimation using deep ensembles[C]∥Proceedings of the 31st International Conference on Neural Information Processing Systems. New York: Curran Associates Inc., 2017: 6405-6416.
[14]
KaoC C, LeeT Y, SenP, et al. Localization-aware active learning for object detection[C]∥Asian Conference on Computer Vision. Heidelbery, Germany: Springer, 2018: 506-522.
[15]
SettlesB. Active learning literature survey[R]. Madison: University of Wisconsin-Madison, 2010.
JiaLyv, FuQu-han. A joint algorithm by combined improved active learning and self-training[J]. Journal of Beijing Normal University (Natural Science), 2022, 58(1): 25-32.
[18]
徐艳. 基于主动学习的图像标注方法研究[D]. 锦州: 辽宁工业大学信息学院, 2014.
[19]
XuYan. Research on image annotation method based on active learning[D]. Jinzhou: School of Information, Liaoning University of Technology, 2014.
[20]
GeZ, LiuS T, WangF, et al. YOLOX: Exceeding YOLO series in 2021[J/OL]. [2021-07-18].
[21]
BishopC M. Mixture density networks[EB/OL]. [2023-03-01].
[22]
ChoiS, LeeK, LimS. Uncertainty-aware learning from demonstration using mixture density networks with sampling-free variance modeling[C]∥ IEEE International Conference on Robotics and Automation. Piscataway, NJ: IEEE, 2018: 6915-6922.
[23]
ChoiJ, EleziI, LeeH J. Active learning for deep object detection via probabilistic modeling[C]∥Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway, NJ: IEEE, 2021: 10264-10273.
[24]
EveringhamM, VanG L, WilliamsC K I. The pascal visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303-338.
[25]
LinT Y, MaireM, BelongieS. Microsoft COCO: common objects in context[C]∥Proceedings of the European Conference on Computer Vision. Heidelberg, Germang: Springer, 2014: 740-755.
[26]
YuanT N, WanF, FuM Y, et al. Multiple instance active learning for object detection[C]∥Proceedings of the Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 5326-5335.
[27]
LiuW, AnguelovD, ErhanC, et al. SSD: Single shot multibox detector[C]∥Proceedings of the European Conference on Computer Vision. Heidelberg, Germang: Springer, 2016: 21-37.
YangWen-zhu, TianXiao-xiao, WangSi-le, et al. Recent advances in active learning algorithms[J]. Journal of Hebei University (Natural Science Edition), 2017, 37(2): 216-224.
[30]
BuurS. Active Learning[M]. California: Morgan & Claypool Publishers, 2012.
[31]
SenerO, SavareseS. Active learning for convolutional neural networks: A core-set approach[C]∥Proceedings of International Conference on Learning Representation. Vancouver, Canada: ICLR Press, 2018: 1-13.
[32]
VanD M L, HintonG. Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2009, 9(11): 2579-2605.