In view of the problems that the classification loss function of existing models in the field of voiceprint recognition cannot effectively distinguish the separability between categories and lack of attention to the quality of voiceprint data, a new classification loss function DV-Softmax is proposed in this paper. Firstly, the working principle of the existing boundary loss function in voiceprint field is introduced. Secondly, the mining loss function in the field of object detection is introduced, and the concept of fuzzy sample is proposed based on it. Then, the MV-Softmax loss function is introduced in the field of face recognition, and fuzzy samples are added to make it adaptive to emphasize the difference between different samples and guide the feature learning. Finally, the voicing recognition was studied on Voxceleb1 and SITW data respectively. The experimental results show that compared with the existing boundary loss function, the equal error rate of DV-Softmax is reduced by 8% and 5.4%, respectively, which verifies that the DV-Softmax loss function effectively solves the separability between categories and concerns the quality of sample voice print data, and has a good performance in the field of voice print recognition。
RanjanR, CastilloC D, ChellappaR. L2-constrained softmax loss for discriminative face verification[J]. Arxiv Preprint, 2017, 3: No.170309507.
[2]
LiuW, WenY, YuZ, et al. Sphereface: deep hypersphere embedding for face recognition[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hololulu, USA, 2017: 212-220.
[3]
WangF, ChengJ, LiuW, et al. Additive margin softmax for face verification[J]. IEEE Signal Processing Letters, 2018, 25(7): 926-930.
[4]
DengJ, GuoJ, XueN, et al. Arcface: additive angular margin loss for deep face recognition[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 4690-4699.
[5]
ThienpondtJ, DesplanquesB, DemuynckK. Cross-lingual speaker verification with domain-balanced hard prototype mining and language-dependent score normalization[J]. Arxiv Preprint, 2020, 7: No. 200707689.
[6]
LiX, WangW, WuL J, et al. Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection[J]. Advances in Neural Information Processing Systems, 2020, 33: 21002-21012.
[7]
MaC, SunH, ZhuJ, et al. Normalized maximal margin loss for open-set image classification[J]. IEEE Access, 2021, 9: 54276-54285.
[8]
LeeJ, WangY, ChoS. Angular margin-mining softmax loss for face recognition[J]. IEEE Access, 2022, 10: 43071-43080.
[9]
BoutrosF, DamerN, KirchbuchnerF, et al. Elasticface: elastic margin loss for deep face recognition[C]∥ Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 1578-1587.
[10]
WangX, ZhangS, WangS, et al. Mis-classified vector guided softmax loss for face recognition[C]∥ Proceedings of the AAAI Conference on Artificial Intelligence, New York, USA, 2020, 34(7): 12241-12248.
[11]
NagraniA, ChungJ S, ZissermanA, et al. Voxceleb: a large-scale speaker identification dataset[J]. Arxiv Preprint, 2017, 6: No.170608612.
[12]
MclarenM, FerrerL, CastanD, et al. The speakers in the wild (SITW) speaker recognition database[C]∥ Proceedings of the Interspeech, San Francisco, USA, 2016: 818-822.
[13]
ShrivastavaA, GuptaA, GirshickR. Training region-based object detectors with online hard example mining[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 761-769.
[14]
LinT Y, GoyalP, GirshickR, et al. Focal loss for dense object detection[C]∥Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 2017: 2980-2988.
[15]
DesplanquesB, ThienpondtJ, DemuynckK, et al. Ecapa-tdnn: emphasized channel attention, propagation and aggregation in tdnn based speaker verification[C]∥Interspeech, Shanghai, China, 2020: 3830-3834.
[16]
ShenH, YangY, SunG, et al. Improving fairness in speaker verification via group-adapted fusion network[C]∥ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 2022: 7077-7081.