基于ITR-Net多源域迁移学习的高铁轴箱轴承故障诊断

邓飞跃; 董少飞; 顾晓辉

doi:10.12454/j.jsuese.202400113

工程科学与技术 ›› 2026, Vol. 58 ›› Issue (01) : 324 -333. DOI: 10.12454/j.jsuese.202400113

智能交叉科学与工程

基于ITR-Net多源域迁移学习的高铁轴箱轴承故障诊断

作者信息 +

Research on Fault Diagnosis of High-speed Train Axlebox Bearing Based on ITR-Net Multi-source Domain Transfer Learning

Author information +

文章历史 +

PDF (2913K)

摘要

高速列车在实际运营中的轴箱轴承故障数据及样本标签稀缺，极大限制了轴箱轴承故障诊断水平的提升。为此，本文提出了一种融合IFormer（inception transformer）与残差网络（ResNet）的多源域深度迁移学习方法ITR-Net（inception transformer and ResNet）用于高速列车轴箱轴承故障诊断研究。该方法选择多种工况下的有监督标签数据作为多源域，首先采用连续小波变换获取轴承一维振动信号的时频谱图作为模型输入，在ITR-Net中构建IFormer网络和ResNet分别作为通用特征提取器和特定特征提取器，充分学习多源域与目标域数据的特征信息；同时，在迁移模型不同节点位置嵌入多核最大均值差异（MK-MMD）、局部最大均值差异（LMMD）与均方误差（MSE）损失函数，构建了一种新的多源域自适应迁移策略，有效减小多源域间及源域与目标域间的特征分布差异并增强多领域对齐程度。最后，通过分析不同载荷及不同转速下6类轴承故障迁移学习任务，对本文方法进行实验验证。结果表明，本文方法可以有效用于不同工况下轴承迁移学习故障诊断，多源域迁移故障诊断准确率显著高于单源域迁移，并且相比现有的深度适应网络（DAN）、联合适应网络（JAN）、相关对齐损伤（CORAL）网络、域对抗神经网络（DANN）、多特征空间适应网络（MFSAN），本文方法迁移学习诊断结果更为优异。研究结果将为迁移学习应用于轴箱轴承故障诊断提供一条新的途径。

Abstract

Objective Efficiently assessing the health status of axlebox bearings in high-speed trains is crucial for maintaining reliable train operation. Current deep learning-based bearing fault diagnosis faces two significant challenges: it requires many labeled actual fault samples, and the training and test sets need to satisfy independent and identically distributed conditions. Transfer learning relaxes the limitations of these issues for intelligent bearing fault diagnosis, and it utilizes transferable knowledge learned from existing labeled datasets to accomplish tasks within different but similar unlabeled datasets. However, the current transfer learning model based on a single source domain suffers from underutilization of labeled data, reduced transfer diagnosis accuracy, and potential negative transfer when the dataset distribution varies significantly. This study proposes ITR-Net (Inception Transformer and ResNet), a multi-source domain deep transfer learning method that integrates IFormer (Inception Transformer) and ResNet for high-speed train axlebox bearing fault diagnosis research. Methods The method selected supervised labeled data under various operating conditions in the multi-source domain, and first obtained the time-frequency spectrograms of the one-dimensional vibration signals of the bearings as the model input by using the continuous wavelet transform based on the Morlet wavelet basis. The main structure of the proposed network framework consisted of three parts, namely the common feature extractor, the specific feature extractor, and the specific classifier. The common feature extractor adopted the IFormer network, which used the classical structure of the convolutional neural network (CNN) with depth-wise separable convolution (DWConv) and maximum pooling to capture the local information of the input data. It employed the multi-head self-attention (MSA) mechanism in the Transformer network to capture the global information of the input data, so the IFormer network mined more comprehensive feature information. The common feature extractor was utilized to extract domain-invariant features in different source and target domains. The specific feature extractor adopted the classical convolutional neural network ResNet, which efficiently extracted the feature information of the input patch while effectively avoiding gradient disappearance or gradient explosion that can have occurred with the increase of network depth. The specific classifier was utilized to output the classification results for different source domains and the target domain, which facilitated subsequent metrics to measure the distance between the different predicted labels output. In applying the transfer strategy, the study optimized the multi-kernel maximum mean difference (MK-MMD) after the common feature extractor to align the overall distributions of the source and target domains; optimized the local maximum mean difference (LMMD) after the specific feature extractor to enable the model to extract fine-grained information from the input features; optimized the cross-entropy loss (CEloss) after the specific classifier to improve the model classification accuracy on the source domains; and optimized the mean-squared error (MSE) loss after the specific classifier to reduce the differences between the predicted labels of the target domain output by different classifiers. Results and Discussions Six multi-source domain transfer tasks were set using the Integrated High-Speed Train Bearing Experiment Station and the Integrated Power Transmission Fault Diagnosis Experiment Station datasets to demonstrate the effectiveness of the proposed method. Analyzing the results of multi-source domain transfer and single-source domain transfer showed that the effect of multi-source domain transfer was significantly better than that of single-source domain transfer. Comparing the proposed method ITR-Net with other popular transfer learning methods, namely deep adaptive networks (DAN), joint adaptation network (JAN), correlation alignment (CORAL), domain adversarial neural network (DANN), and Multi-feature spatial adaptation networks (MFSAN), the proposed method achieved an average transfer accuracy of 96.66% in six transfer tasks, while the comparative methods achieved 87.24%, 88.30%, 92.45%, 94.11%, and 93.35%, respectively. This result demonstrated the superiority of the proposed method. The t-distribution stochastic neighbor embedding (t-SNE) visualized the feature clustering of the target domain features extracted from the six migration tasks. It was observed that the target domain features in the proposed method achieved more distinct clustering based on different bearing fault types, and the overall clustering of the unsupervised target domain features under the same fault types was improved, which proved the method's effectiveness. In the ablation experiments, the average transfer accuracies of using MK-MMD, LMMD, and MSE alone were 92.30%, 93.19% and 93.18%, respectively; when MK-MMD and LMMD were utilized together, the average migration accuracy reached 95.26%; when the complete loss function was applied, the average accuracy reached the maximum of 98.63%. The ablation results proved that the adaptive migration strategy constructed using MK-MMD, LMMD, and MSE further enhanced the degree of domain feature alignment among multi-source domains, as well as between individual source domains and the target domain, resulting in the best migration learning effect. Conclusions The results showed that the proposed method can fully utilize the data information of multiple source domains, and the transfer using multiple source domains can effectively improve the diagnosis performance of faults in the target domain. The distributions of the source domains and the target domains can be aligned, and the ablation experiments confirmed the effects of different loss functions on the transfer performance of the network models by applying the MK-MMD, LMMD, CELoss, and MSE loss functions to construct the transfer strategy at different network stage positions. The results provide a new approach for applying transfer learning to axlebox bearing fault diagnosis.

Graphical abstract

关键词

轴箱轴承 / 迁移学习 / 故障诊断 / 领域自适应 / 特征学习

Key words

axlebox bearing / transfer learning / fault diagnosis / domain adaptation / feature learning

引用本文

引用格式 ▾

[Author(id=1261374890440348159, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261370901350969376, orderNo=0, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=dengfy@stdu.edu.cn, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1261374890515845636, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261370901350969376, authorId=1261374890440348159, language=EN, stringName=Feiyue DENG, firstName=Feiyue, middleName=null, lastName=DENG, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=¹, address=^1.School of Mechanical Engineering, Shijiazhuang Tiedao University, Shijiazhuang 050043, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1261374890574565895, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261370901350969376, authorId=1261374890440348159, language=CN, stringName=邓飞跃, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=¹, address=^1.石家庄铁道大学机械工程学院，河北石家庄 050043, bio={"content":"

邓飞跃（1985—），男，副教授，博士.研究方向：设备故障诊断与状态检测等. E-mail：dengfy@stdu.edu.cn

"}, bioImg=null, bioContent=

邓飞跃（1985—），男，副教授，博士.研究方向：设备故障诊断与状态检测等. E-mail：dengfy@stdu.edu.cn

, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1261374890088026609, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261370901350969376, xref=1., ext=[AuthorCompanyExt(id=1261374890104803826, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261370901350969376, companyId=1261374890088026609, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=^1.School of Mechanical Engineering, Shijiazhuang Tiedao University, Shijiazhuang 050043, China), AuthorCompanyExt(id=1261374890121581045, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261370901350969376, companyId=1261374890088026609, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=^1.石家庄铁道大学机械工程学院，河北石家庄 050043)])]), Author(id=1261374890859778572, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261370901350969376, orderNo=1, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1261374890926887440, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261370901350969376, authorId=1261374890859778572, language=EN, stringName=Shaofei DONG, firstName=Shaofei, middleName=null, lastName=DONG, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=¹, address=^1.School of Mechanical Engineering, Shijiazhuang Tiedao University, Shijiazhuang 050043, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1261374890981413394, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261370901350969376, authorId=1261374890859778572, language=CN, stringName=董少飞, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=¹, address=^1.石家庄铁道大学机械工程学院，河北石家庄 050043, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1261374890088026609, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261370901350969376, xref=1., ext=[AuthorCompanyExt(id=1261374890104803826, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261370901350969376, companyId=1261374890088026609, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=^1.School of Mechanical Engineering, Shijiazhuang Tiedao University, Shijiazhuang 050043, China), AuthorCompanyExt(id=1261374890121581045, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261370901350969376, companyId=1261374890088026609, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=^1.石家庄铁道大学机械工程学院，河北石家庄 050043)])]), Author(id=1261374891031745046, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261370901350969376, orderNo=2, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=guxh@stdu.edu.cn, emailSecond=null, emailThird=null, correspondingAuthor=1, authorType=1, ext={EN=AuthorExt(id=1261374891337929246, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261370901350969376, authorId=1261374891031745046, language=EN, stringName=Xiaohui GU, firstName=Xiaohui, middleName=null, lastName=GU, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=², address=^2.State Key Laboratory of Mechanical Behavior in Traffic Engineering Structure and System Safety, Shijiazhuang Tiedao University, Shijiazhuang 050043, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1261374891392455200, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261370901350969376, authorId=1261374891031745046, language=CN, stringName=顾晓辉, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=², address=^2.石家庄铁道大学省部共建交通工程结构力学行为与系统安全国家重点实验室，河北石家庄 050043, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1261374890171912695, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261370901350969376, xref=2., ext=[AuthorCompanyExt(id=1261374890188689912, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261370901350969376, companyId=1261374890171912695, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=^2.State Key Laboratory of Mechanical Behavior in Traffic Engineering Structure and System Safety, Shijiazhuang Tiedao University, Shijiazhuang 050043, China), AuthorCompanyExt(id=1261374890205467130, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261370901350969376, companyId=1261374890171912695, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=^2.石家庄铁道大学省部共建交通工程结构力学行为与系统安全国家重点实验室，河北石家庄 050043)])])] 邓飞跃,董少飞,顾晓辉. 基于ITR-Net多源域迁移学习的高铁轴箱轴承故障诊断[J]. 工程科学与技术, 2026, 58(01): 324-333 DOI:10.12454/j.jsuese.202400113

登录浏览全文

4963

注册一个新账户忘记密码

本刊网刊

轴箱轴承作为高铁走行部的核心旋转部件，其健康程度对高铁的安全性、稳定性和舒适性至关重要。为此，铁路维修部门采取了严格的轴箱轴承检修制度，导致列车实际运营中轴箱轴承故障数据样本非常稀少^[1‒2]。故障数据样本及标签的稀缺限制了轴箱轴承故障诊断水平的提升。因此，亟待研究更为有效的轴箱轴承故障诊断方法。

针对故障数据样本与标签稀缺的问题，可使用迁移学习（TL）从样本数据与标签较为丰富的领域内学习域不变特征知识，增强网络模型的泛化能力，从而对样本及标签缺少场景下的故障诊断问题进行求解^[3‒4]。领域自适应（DA）策略能够有效减小源域与目标域之间的域差异，是实现TL的主要途径，包含基于实例、特征、模型与关系等4类方法^[5]。通过融合深度学习的深层特征提取能力，深度TL可有效用于数据及样本标签缺失条件下的设备故障诊断。Lv等^[6]提出了一种卷积神经网络（CNN）与多核动态分布的迁移学习方法，能够根据目标域的数据分布调整特征权重；Wang等^[7]基于相关对齐损伤（CORAL）函数提出了分层深度领域适应迁移方法，能够适应不同领域的数据和特征；陈祝云等^[8]构建了一种增强迁移CNN，同时应用分类损失函数和分类器判别损失函数实现目标域样本与源域样本自适应匹配。Wang等^[9]提出了一种结合专家知识和领域适应的深度TL方法，通过对抗训练的方式减小领域之间的差异；Xiang等^[10]发展了一种基于Wasserstein距离的深度对抗TL方法，通过对抗训练最小化源域和目标域之间的差异来学习一个共享的特征表示。基于DA的TL研究主要有两种实现策略：一是通过在神经网络中插入度量函数，捕捉跨域的特征分布差异，如KL散度^[11]、最大均值差异（MMD）^[12]、联合最大均值差异（JMMD）等^[13]；二是采用对抗训练方法，包括特征生成器和领域判别器^[14‒15]，用于域内不变知识的迁移。上述研究虽然取得了不错的效果，但都基于只有一个源域的假设，不能同时使用多个源域的数据。

真实工业场景中，设备故障数据往往来自多个不同分布的源域。因此，同时使用多个源域的数据进行迁移，能提供更为丰富的诊断知识。此外，多源域TL可以有效弥补单源域TL可能造成的负迁移现象，具备更好的迁移效果。Zhao等^[16]基于CNN提出了一个包括特征提取、域适应和故障分类的多源域TL框架；Rezaeianjouybari等^[17]提出了一种基于特征级分布对齐和任务特定分布对齐的多源域适应框架，在多个数据集上取得了较好的迁移结果；吕丞辉等^[18]基于CNN与多核最大均值差异（MK-MMD）距离度量方法，提出了一种多源域深度TL方法，能更全面地学习多源域数据的共享特征不变表示。上述研究多采用传统CNN作为骨干网络，虽然能提取部分多源域数据特征，但受限于模型性能，难以准确表征所有域的特征表示，并且忽视了多源域与目标域的分类不匹配问题，面对多种复杂工况下的设备故障诊断时识别精度偏低。

基于上述分析，本文提出了一种融合IFormer（inception transformer）与残差网络（ResNet）的多源域深度迁移学习方法ITR-Net（inception transformer and ResNet），用于高铁轴箱轴承故障诊断研究，并通过实验分析验证了本文方法的有效性。本文方法的创新之处如下。

1）在ITR-Net中：鉴于IFormer网络优异的远距离依赖关系捕捉能力，将其作为通用特征提取器；鉴于ResNet良好的局部特征提取能力，将其作为特定特征提取器。通过结合两种网络的优点，增强多源域数据特征提取的准确性，降低不同领域特征不变表示的差异性，减少目标域的分类错误。

2）为解决多源域与目标域之间的特征分布差异问题，在迁移网络不同节点位置分别运用MK-MMD、局部最大均值差异（LMMD）与均方误差（MSE）损失函数，构建了一种新的多源域自适应迁移策略，有效减少了未标记的目标域和各个标记的源域之间的领域偏移。

1 领域自适应策略

1.1 无监督领域自适应

无监督领域自适应（UDA）作为TL的一个分支，通过学习域不变特征，弥合不同域间的分布差异，在故障诊断方面被广泛应用^[19‒20]。给定源域数据集

D s = x i s, y i s n s

和目标域数据集

D t = x j t n t

。

D s

中包含

n s

个带标签的样本

x i s

，

y i s

为

x i s

的标签，

y i s ∈ R C

，

C

为类别数；

D t

中包含

n t

个没有标签的样本

x j t

。在目标域无标签的情况下，UDA利用源域数据集，通过构建一个深度神经网络，获得源域和目标域的共同特征，从而提高目标域的预测准确率^[21]。

1.2 MK-MMD

MMD通过衡量源域与目标域在再生希尔伯特空间（RKHS）中均值差异的大小来对齐这两个域的边缘概率分布^[22]。给定两个分布分别为

p

和

q

的数据集

D s

和

D t

，利用核函数，以内积形式将其非线性映射到RKHS，计算其MMD，并通过最小化MMD提高模型在目标域上的泛化性能。MMD可用

d M M D 2 (p, q)

表示，其表达式如下：

d M M D 2 (p, q) = 1 n s 2 ∑ i, j = 1 n s k (x i s, x j s) + 1 n t 2 ∑ i, j = 1 n t k (x i t, x j t) - 2 n s n t ∑ i = 1 n s ∑ j = 1 n t k (x i s, x j t)

（1）

式中，

k (⋅)

为RKHS函数。

MMD中核函数的选取是固定的，可以采用单一的高斯核函数或线性核函数，但是无法确定是否采用了最优的核函数。MK-MMD是在MMD的基础上发展而来的，不采用传统的单一核函数而是选择多个核函数构造一个总的核函数。MK-MMD为分布

p

和分布

q

的平均嵌入间的RKHS距离，可用

d 2 (p, q)

表示，表达式如下^[23]：

d 2 (p, q) = E p ϕ (x i s) - E q ϕ (x j t) 2

（2）

式中，

E [⋅]

为嵌入样本的均值，

ϕ (⋅)

为映射到RKHS的非线性特征映射函数，

⋅

表示2‒范数。当且仅当

d 2 (p, q) = 0

时，分布p等于分布q。

MK-MMD通过多个核函数融合构建复合RKHS核函数，对M个不同核函数进行加权求和，表达式如下。

k M = ∑ m = 1 M β m k m

（3）

式中：k_M 为复合核函数；M为核函数个数；

β m

为第m个核函数的权重，其相关约束是为了保证产生的多核是特有的。不同核函数的选取对减少

p

和

q

的均值分布差异非常重要，通过设定最优核为多个核的线性组合，从而得到一个最优的核函数。

MK-MMD的经验估计表达式为^[24]：

d 2 (p, q) = 1 n s 2 ∑ i = 1 n s ∑ j = 1 n s k M (x i s, x j s) + 1 n t 2 ∑ i = 1 n t ∑ j = 1 n t k M (x i t, x j t) - 2 n s n t ∑ i = 1 n s ∑ j = 1 n t k M (x i s, x j t)

（4）

构建多层MK-MMD，MK-MMD损失记为

L m k - m m d

，其表达式如下：

L m k - m m d = ∑ I = 1 Q d 2 (D s I, D t I)

（5）

式中，

d 2 (D s I, D t I)

为第I个全连接（FC）层的MK-MMD损失，

I ∈ {1,2, ⋯, Q}

，其中，Q为FC层的层数。

1.3 LMMD

MK-MMD虽然采用多核MMD度量，但面对多源域的海量数据，仍然存在不能充分利用源域特征信息的不足，不同类型的迁移场景效果存在差异。与MK-MMD相比，LMMD可以衡量源域与目标域的条件概率分布差异，更好地捕捉数据样本局部结构的差异。LMMD在实际应用中通常只需计算样本之间的近邻关系，而无须计算全部样本之间的距离，处理大规模数据时更加高效。计算时先求源域与目标域中同类别数据的MMD损失，再求其平方的均值。将LMMD损失记为

L l m m d

，其表达式如下：

L l m m d = 1 C ∑ c = 1 C ∑ x i s ∈ D s w i s c ϕ (x i s) - ∑ x j t ∈ D t w j t c ϕ (x j t) 2

（6）

式中：c为标签类别，

c ∈ 1,2, ⋯, C

；

w i s c

和

w j t c

分别为

x i s

和

x j t

属于c类别的权重。

∑ i = 1 n s w i s c

和

∑ j = 1 n t w j t c

的值都为1，

∑ x i s ∈ D s w i s c ϕ (x i s)

和

∑ x j t ∈ D t w j t c ϕ (x j t)

分别为源域和目标域类别c的加权和。计算样本

x i s

的权重

w i s c

的公式如下：

w i s c = y i c ∑ (x j s, y j) ∈ D y j c

（7）

式中，

y i c

、

y j c

分别为标签向量

y i

、

y j

的第

c

个元素，D为总集合。采用真实标签计算源域样本

x i s

属于类别

c

的权重

w i s c

；目标域样本没有标签，故采用分类器的Softmax函数来计算每个样本

x j t

属于类别

c

的权重

w j t c

。

将特征提取器提取的源域和目标域的每个样本的特征输出表示为

z i s

和

z j t

，利用核函数将

z i s

和

z j t

映射到RKHS，源域和目标域的LMMD损失记为

L l m m d

，其表达式如下：

L l m m d = 1 C ∑ c = 1 C ∑ i = 1 n s ∑ j = 1 n s w i s c w j s c k (z i s l, z j s l) + ∑ i = 1 n t ∑ j = 1 n t w i t c w j t c k (z i t l, z j t l) - 2 ∑ i = 1 n s ∑ j = 1 n t w i s c w j t c k (z i s l, z j t l)

（8）

式中，上标l表示第

l

层的输出，

l ∈ {1,2, ⋯, F}

，F为总层数。

2 多源域深度迁移学习框架

2.1 特征提取模块

2.1.1 基于IFormer的通用特征提取器

IFormer模块由两个残差连接结构串联构成，其中第1个残差结构中的ITM（inception token mixer）是其模块核心，它深度融合了Transformer与CNN的操作^[25]，其结构如图1所示。ITM结合了CNN与Transformer的优势，能够同时捕获低频信息和高频信息。首先，将输入特征按通道维度进行分割；然后，把分割后的特征分别输入高频mixer和低频mixer。输入特征图为 X （

X ∈ R s × s × u

，上标s为特征图的边长，上标u为特征图的通道数），

X

沿通道维度被分割为高频特征

X h

和低频特征

X l

（

X h ∈ R s × s × u h

，上标

u h

为高频通道数；

X l ∈ R s × s × u l

，上标

u l

为低频通道数；

u = u h + u l

），

X h

和

X l

分别被输送至高频mixer和低频mixer。

高频mixer：将

X h

划分为

X h 1

和

X h 2

（

X h 1 ∈ R s × s × u h 1

，

X h 2 ∈

R s × s × u h 2

，

u h 1 = u h 2 = u h / 2

），

X h 1

被送入最大池化层（Maxpool）和线性层（Linear）分支，

X h 2

被送入线性层和深度可分离卷积层（DwConv）分支。两个分支的输出可表示为：

Z h 1 = F C (M a x p o o l (X h 1)), Z h 2 = D w C o n v (F C (X h 2))

（9）

式中，

Z h 1

和

Z h 2

为高频mixer的输出特征，FC(·)为线性层输出。

低频mixer：在注意力机制（Attention）之前使用平均池化（Avgpool）来减小

X l

的尺寸，并在注意力机制之后通过上采样（Upsample）操作恢复至原尺寸。这种结构可以有效降低计算成本，并使注意力机制专注于提取全局信息。该分支的输出可表示为：

Z l = U p s a m p l e (M S A (A v g p o o l (X l)))

（10）

式中，

Z l

为低频mixer的输出特征，MSA(·)为注意力机制输出特征。

特征融合层（Fusion）：将输出

Z h 1

、

Z h 2

和

Z l

在通道维度上连接并融合获得输出

Z c

，表达式如下：

Z c = F u s i o n (C o n c a t (Z h 1, Z h 2, Z l))

（11）

式中，Concat(·)为拼接函数。

经过IFormer模块的第1个残差结构后，输出如下表示：

Z = X + I T M (X)

（12）

式中， Z 为IFormer模块第1个残差结构的输出特征。

IFormer模块第2个残差结构由层归一化（LayerNorm）和前馈神经网络（FFN）构成，最终模块输出 H 如下：

H = Z + F F N (L a y e r N o r m (Z))

（13）

2.1.2 基于ResNet的特定特征提取器

CNN具有强大的局部特征提取能力，因此，采用CNN作为特定特征提取器。ResNet是基于CNN的一种代表性网络模型^[26]，它采用残差连接，能够让网络中某些层跳过下一层神经元的连接，通过隔层相连弱化每层之间的强联系，有效解决了CNN的梯度消失与梯度爆炸问题。本文构建的ResNet模型结构及网络参数如图2所示，先是一个尺度为7×7，通道数为64，步长（stride）为2的卷积层（Conv），再连接一个Maxpool层，之后顺序连接4个残差模块。为进一步提高网络的收敛速度和准确率，残差模块中卷积层之间添加了批标准化（BN）层和ReLU激活层。在通用特征提取器提取多源域与目标域通用特征之后，构建ResNet网络作为特定特征提取器，分别学习不同源域与目标域的特定特征表示。

2.2 多源域迁移学习模型框架

本文提出的ITR-Net框架如图3所示，模型输入端有2个源域及1个目标域数据集。模型主要包含3部分，分别如下。

第1部分是基于IFormer模块构建的通用特征提取器。输入图像通过块分割层将输入的二维图像分割为互不重合的多个小图像块，再通过线性嵌入层将其投影到任意维度。多源域与目标域数据集通过通用特征提取器分别学习，得到各自数据集的共有特征，并计算不同源域与目标域数据的MK-MMD损失。该部分采用多层适配方式，在通用特征提取器后顺序连接了3个FC层（FC1、FC2和FC3），计算每个FC层后各个源域与目标域数据集特征的MK-MMD损失，可有效减小源域和目标域之间的数据特征偏差。源域1D_s1和源域2D_s2与目标域的MK-MMD损失

L m k - m m d 1

和

L m k - m m d 2

的表达式如下：

L m k - m m d 1 = ∑ I = 1 3 d 2 (D s 1 I, D t I), L m k - m m d 2 = ∑ I = 1 3 d 2 (D s 2 I, D t I)

（14）

第2部分是基于ResNet网络构建的特定特征提取器。该部分由两个ResNet构成，网络结构相同，单模型参数不同。通过ResNet网络模型可以进一步得到不同源域与目标域的数据特征信息，其后连接1个FC层（FC4），分别计算每个源域与目标域的LMMD损失。通过进一步捕捉源域与目标域数据的细粒度特征信息，减小数据特征的域差异的分布。源域1和源域2与目标域的LMMD损失

L l m m d 1

和

L l m m d 2

的表达式如下：

L l m m d 1 = d 2 (D s 1, D t), L l m m d 2 = d 2 (D s 2, D t)

（15）

第3部分是领域特定的特征分类器C1和C2。C1和C2对同一目标域输出的类别标签应该是相同的，但因为采用了两个源域分别与目标域对齐，两个源域间的差异会导致同一目标域的分类结果存在差异。因此，通过计算特定分类器输出目标标签的MSE损失（记为

L m s e

）减小不同分类器之间的差异，其表达式为：

L m s e = 1 n t ∑ i = 1 n t (y^i t 1 - y^i t 2) 2

（16）

式中，

y^i t 1

和

y^i t 2

分别为C1和C2输出的目标域预测标签。

同时，针对分类器对两个源域的分类结果，通过计算交叉熵损失函数（CELoss）的值

L c l s 1

和

L c l s 2

，提高源域的分类准确度。

源域1和源域2与目标域之间的整体损失

L 1

和

L 2

的表达式分别为：

L 1 = L c l s 1 + λ (L m k - m m d 1 + L l m m d 1)

（17）

L 2 = L c l s 2 + λ (L m k - m m d 2 + L l m m d 2)

（18）

式（17）、（18）中：

λ

为权衡参数，

λ = 2 / (1 + e - 10 f) - 1

，

f ∈ 0,1

。

L c l s

用于减小模型中源域的预测标签与真实标签之间的差异，

L m k - m m d

用于选取最优核学习源域与目标域的公共域不变特征，

L l m m d

用于减小不同域内同类子域之间的差异，学习特定源域和目标域的故障特征，

L m s e

用于减小不同分类器之间的差异。

在本文多源域TL模型的不同节点位置，构建的整体损失L为：

L = L 1 + L 2 + L m s e

（19）

3 实验分析

本文采用连续小波变换（CWT）方法将一维轴承信号转换为二维时频图像，并输入所构建的ITR-Net框架。Morlet小波与轴承故障冲击信号波形相似，具有较好的时频分辨率和瞬态检测能力，因此选择Morlet小波作为基函数进行CWT。

3.1 实验参数设置

本文模型代码编写基于python3.8语言，开发环境为PyCharm，深度学习框架为Pytorch。实验在Windows11系统下进行，CPU型号为Intel^®Core™ i5，GPU为GeForce RTX 3060（6 GB）。网络模型超参数设置如下：初始学习率为0.001，Dropout的值为0.5，迭代优化次数为100，优化策略为随机梯度下降（SGD）方法。实验中，源域数据集全部参与网络模型训练，目标域数据集训练和测试的比例按照7∶3进行划分。

3.2 实验1：高速列车轴箱轴承实验

3.2.1 数据集介绍

轴箱轴承实验是在省部共建交通工程结构力学行为与系统安全国家重点实验室高速列车轴承综合实验台上完成的。图4为高速列车轴承综合实验台。试验台一端为测试轴承，另一端为连接电机的支撑轴承。通过液压加载装置，可以对轴承施加轴向和径向载荷，用以模拟轴箱轴承工作的真实受力状态。测试轴承为德国舍弗勒FAG双列圆锥滚子轴承，加速度传感器安装在测试轴承端盖位置，采样频率为51.2 kHz。轴承的健康状态分为3类：内圈故障、外圈故障和正常。图5为故障轴承，通过线切割的方式在轴承内外圈表面加工出长5 mm、宽1 mm、深0.7 mm的凹痕。

3.2.2 迁移任务设置

在实验测试中，转速分别设为1 500 r/min、1 800 r/ min和2 100 r/min。3种转速下，液压加载方式分别为静载、动载与不加载，因此3种工况下轴箱轴承的转速与载荷各不相同。每种工况下分别采集300个样本，一个样本有1 024个数据点，通过Morlet小波变换转换为图像格式。将3种工况（A、B和C）下的样本作为迁移任务分析数据集。设置3类迁移任务，分别为：AB→C、AC→B和BC→A。以AB→C为例，代表源域为工况A和B数据集，目标域为工况C数据集，其余两类迁移任务类似。同时，设置对应单个数据样本下的单源域迁移任务（如A→C、B→C）。

3.2.3 实验结果分析

首先，基于本文所提的ITR-Net迁移网络框架进行单源域与多源域迁移对比分析。图6为轴箱轴承数据下基于ITR-Net的多源域和单源域迁移结果。由图6可见，在全部3类迁移任务中，多源域迁移结果准确率明显高于对应的单源域迁移结果。3类迁移任务中，单源域迁移结果平均准确率为94.93%，而多源域为98.77%，提升了3.84个百分点。这说明进行多源域迁移学习，可以更准确、更全面地学习不同源域数据与目标域数据的特征不变表示，迁移结果更为准确。

其次，为了进一步验证本文方法的优越性，分别采用现有的5种迁移方法：深度适应网络（DAN）^[27]、联合适应网络（JAN）^[13]、CORAL^[7]、域对抗神经网络（DANN）^[28]和多特征空间适应网络（MFSAN）^[29]，与本文方法进行对比分析。表1为轴箱轴承数据下不同迁移方法准确率对比。由表1可见，多源域迁移结果准确率普遍高于单源域迁移学习，并且在3类不同迁移任务中，本文方法的迁移学习准确率最高，迁移结果最优。这证实了本文方法的优越性，说明在不同转速及载荷工况下，该方法对高速列车轴箱轴承不同类型故障迁移诊断的有效性。

3.3 实验2：齿轮箱轴承实验

3.3.1 数据集介绍

为了进一步对本文方法进行验证，在动力传动故障诊断综合（DDS）实验台上开展齿轮箱轴承故障实验。图7为动力传动故障诊断综合实验台，主要由电动机、行星齿轮箱、定轴齿轮箱、传感器、磁粉制动器组成，在齿轮箱顶部可以对测试轴承所在转轴施加径向载荷。测试轴承型号为SKF61800，有3种不同的健康状态：正常、内圈故障、外圈故障。其中，轴承外圈点蚀直径约0.2 mm，深0.05 mm；内圈点蚀直径约0.1 mm，深0.05 mm。实验过程中，加速度传感器安装在测试轴承端盖位置，采样频率为51.2 kHz。

3.3.2 迁移任务设置

在实验测试中，电动机转速分别设为1 200 r/min、2 100 r/min和2 400 r/min。3种转速下，齿轮箱上方施加的载荷分别为0、300和600 N。每种工况下分别采集300个样本，每个样本为1 024个数据点，通过Morlet小波变换转换为图像格式。采用3种工况（D、E和F）下的数据样本设置3类多源域迁移任务分别为：DE→F、DF→E和EF→D。同时，设置对应单个数据样本下的单源域迁移任务。

3.3.3 实验结果分析

基于本文方法，进行单源域与多源域迁移任务对比分析。图8为齿轮箱轴承数据下基于ITR-Net的多源域和单源域迁移结果。由图8可见，在对应迁移任务中，多源域迁移学习准确率明显高于单源域迁移学习。综合3类迁移任务，多源域迁移平均准确率为94.55%，而单源域为85.32%。多源域迁移相比于单源域迁移结果更好，提升了9.23个百分点。

将本文方法与现有5种迁移学习方法进行对比实验并分析。表2为齿轮箱轴承数据下不同迁移方法精确率对比。在3类不同迁移任务中，本文方法迁移准确率最高。通过不同工况下齿轮箱轴承数据进一步证实了本文方法的有效性。

3.4 消融实验分析

为进一步分析本文方法迁移策略中MK-MMK、LMMD与MSE这3类损失函数对迁移学习结果的影响，针对高速列车轴箱轴承数据下3类迁移任务，分别选取各个迁移损失函数进行消融实验分析。需要指出的是，CELoss用于消除源域预测标签与真实标签的分布差异，包含在每次消融实验中。表3为消融实验结果。由表3可见：单一LMMD在大部分任务上对本文模型迁移结果影响大于单一MK-MMD和MSE；融合使用MK-MMD与LMMD比使用单一损失函数迁移性能更好；本文使用MK-MMD、LMMD、MSE共同构建自适应迁移策略，可以进一步增强多源域间及各个源域与目标域间的领域特征对齐程度，迁移学习效果最好。

3.5 迁移结果可视化分析

为进一步分析本文方法迁移学习后的无标签数据特征聚类的表现，利用t‒分布邻域嵌入（T-SNE）算法^[30]分别对上述两组实验的6类迁移任务中提取的目标域数据进行特征提取聚类的可视化分析。图9和图10分别为轴箱轴承和齿轮箱轴承数据迁移学习可视化结果。由图9和图10可见，针对不同的迁移任务，本文方法中目标域数据特征按照轴承不同故障类型实现了较为明显的聚类，相同故障类型下无监督目标域特征整体聚类效果较好，只有少部分特征点存在聚类混淆，进一步证实了本文方法的有效性。

4 结论

1）本文提出了一种多源域深度迁移学习方法ITR-Net，通过构建的通用特征提取器与特定特征提取器，能够更充分学习多源域与目标域的域不变特征表示。基于MK-MMD、LMMD、MSE等损失函数建立一种新的多源域自适应迁移策略，进一步提升了无监督目标域和有监督源域之间的领域特征对齐程度。

2）针对3种不同转速载荷工况，开展轴箱轴承与齿轮箱轴承迁移任务实验分析，结果表明，本文方法的多源域迁移轴承故障识别准确率显著高于单源域迁移方式。同时，本文方法的迁移效果要优于现有的DAN、DANN、CORAL、MFSAN、JAN等5种多源域迁移方法，这为当前高速列车轴箱轴承无监督故障诊断工程应用提供了一定的参考价值。

参考文献

原文顺序 | 出版日期 | 本文引用

[1]	Liu Zechao, Yang Shaopu, Liu Yongqiang,et al.Adaptive correlated Kurtogram and its applications in wheelset-bearing system fault diagnosis[J].Mechanical Systems and Signal Processing,2021,154:107511. doi:10.1016/j.ymssp.2020.107511

[2]	Deng Feiyue, Wang Hongli, Gao Ruiyang,et al.Vibration characteristics analysis of the inner race fault of axlebox bearing under wheel-rail excitation[J].Journal of Hebei University (Natural Science Edition),2023,43(6):561‒570.

[3]	邓飞跃,王红力,高瑞洋,等.轮轨激励条件下轴箱轴承内圈故障振动特性分析[J].河北大学学报(自然科学版),2023,43(6):561‒570.

[4]	Qian Quan, Qin Yi, Wang Yi,et al.A new deep transfer learning network based on convolutional auto-encoder for mechanical fault diagnosis[J].Measurement,2021,178:109352. doi:10.1016/j.measurement.2021.109352

[5]	Li Weihua, Huang Ruyi, Li Jipu,et al.A perspective survey on deep transfer learning for fault diagnosis in industrial scenarios:Theories,applications and challenges[J].Mechanical Systems and Signal Processing,2022,167:108487. doi:10.1016/j.ymssp.2021.108487

[6]	Hakim M, Omran A A B, Ahmed A N,et al.A systematic review of rolling bearing fault diagnoses based on deep learning and transfer learning:Taxonomy,overview,application,open challenges,weaknesses and recommendations[J].Ain Shams Engineering Journal,2023,14(4):101945. doi:10.1016/j.asej.2022.101945

[7]	Lv Mingzhu, Liu Shixun, Su Xiaoming,et al.Deep transfer network with multi-kernel dynamic distribution adaptation for cross-machine fault diagnosis[J].IEEE Access,2021,9:16392‒16409. doi:10.1109/access.2021.3053075

[8]	Wang Xiaoxia, He Haibo, Li Lusi.A hierarchical deep domain adaptation approach for fault diagnosis of power plant thermal system[J].IEEE Transactions on Industrial Informatics,2019,15(9):5139‒5148. doi:10.1109/tii.2019.2899118

[9]	Chen Zhuyun, Zhong Qi, Huang Ruyi,et al.Intelligent fault diagnosis for machinery based on enhanced transfer convolutional neural network[J].Journal of Mechanical Engineering,2021,57(21):96‒105.

[10]	陈祝云,钟琪,黄如意,等.基于增强迁移卷积神经网络的机械智能故障诊断[J].机械工程学报,2021,57(21):96‒105.

[11]	Wang Qin, Taal C, Fink O.Integrating expert knowledge with domain adaptation for unsupervised fault diagnosis[J].IEEE Transactions on Instrumentation and Measurement,2021,71:3500312. doi:10.1109/tim.2021.3127654

[12]	Xiang Gang, Tian Kun.Spacecraft intelligent fault diagnosis under variable working conditions via Wasserstein distance-based deep adversarial transfer learning[J].International Journal of Aerospace Engineering,2021,2021:6099818. doi:10.1155/2021/6099818

[13]	Tzeng E, Hoffman J, Zhang Ning,et al.Deep domain confusion: Maximizing for domain invariance[EB/OL].(2014‒12‒10)[2024‒02‒10].

[14]	Borgwardt K M, Gretton A, Rasch M J,et al.Integrating structured biological data by Kernel Maximum Mean Discrepancy[J].Bioinformatics,2006,22(14):e49‒e57. doi:10.1093/bioinformatics/btl242

[15]	Long Mingsheng, Zhu Han, Wang Jianmin,et al.Deep transfer learning with joint adaptation networks[C]//Proceedings of the 34th International Conference on Machine Learning.Sydney:Journal of Machine Learning Research,2017:2208‒2217.

[16]	Sicilia A, Zhao Xingchen, Hwang S J.Domain adversarial neural networks for domain generalization:When it works and how to improve[J].Machine Learning,2023,112(7):2685‒2721. doi:10.1007/s10994-023-06324-x

[17]	Mao Wentao, Liu Yamin, Ding Ling,et al.A new structured domain adversarial neural network for transfer fault diagnosis of rolling bearings under different working conditions[J].IEEE Transactions on Instrumentation and Measurement,2020,70:3509013. doi:10.1109/tim.2020.3038596

[18]	Zhao Jing, Yang Shaopu, Li Qiang,et al.Reply to Comment on 'A novel transfer learning bearing fault diagnosis method based on multiple-source domain adaptation'[J].Measurement Science and Technology,2022,33(9):098001. doi:10.1088/1361-6501/ac6d48

[19]	Rezaeianjouybari B, Shang Yi.A novel deep multi-source domain adaptation framework for bearing fault diagnosis based on feature-level and task-specific distribution alignment[J].Measurement,2021,178:109359. doi:10.1016/j.measurement.2021.109359

[20]	Chenghui Lyu, Cheng Jinjun, Hu Yangguang,et al.Online fault diagnosing of Rudders based on multi-source domain deep transfer learning[J].Journal of Ordnance Equipment Engineering,2022,43(9):60‒67.

[21]	吕丞辉,程进军,胡阳光,等.基于多源域深度迁移学习的舵机在线故障诊断[J].兵器装备工程学报,2022,43(9):60‒67.

[22]	Liu Xiaofeng, Yoo C, Xing Fangxu,et al.Deep unsupervised domain adaptation:A review of recent advances and perspectives[J].APSIPA Transactions on Signal and Information Processing,2022,11(1):e25. doi:10.1561/116.00000192

[23]

Xu Youzhong, Han Tianyu, Shi Xi,et al.Unsupervised domain adaptation fault diagnosis method using weight-based mask network[C]//Proceedings of the 2023 Global Reliability and Prognostics and Health Management Conference (PHM‒Hangzhou).Hangzhou:IEEE,2023:1‒7. doi:10.1109/phm-hangzhou58797.2023.10482568

[24]	Zhu Yongchun, Zhuang Fuzhen, Wang Jindong,et al.Deep subdomain adaptation network for image classification[J].IEEE Transactions on Neural Networks and Learning Systems,2021,32(4):1713‒1722. doi:10.1109/tnnls.2020.2988928

[25]	Li Yibin, Song Yan, Jia Lei,et al.Intelligent fault diagnosis by fusing domain adversarial training and maximum mean discrepancy via ensemble learning[J].IEEE Transactions on Industrial Informatics,2020,17(4):2833‒2841. doi:10.1109/tii.2020.3008010

[26]	Gretton A, Sejdinovic D, Strathmann H,et al.Optimal kernel choice for large-scale two-sample tests[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems.Red Hook:Curran Associates,2012:1205‒1213.

[27]	Che Changchang, Wang Huawei, Ni Xiaomei,et al.Domain adaptive deep belief network for rolling bearing fault diagnosis[J].Computers & Industrial Engineering,2020,143:106427. doi:10.1016/j.cie.2020.106427

[28]	Si Chenyang, Yu Weihao, Zhou Pan,et al.Inception transformer[J].Advances in Neural Information Processing Systems,2022,35:23495‒23509.

[29]	Liang Pengfei, Wang Wenhui, Yuan Xiaoming,et al.Intelligent fault diagnosis of rolling bearing based on wavelet transform and improved ResNet under noisy labels and environment[J].Engineering Applications of Artificial Intelligence,2022,115:105269. doi:10.1016/j.engappai.2022.105269

[30]	Long Mingsheng, Cao Yue, Wang Jianmin,et al.Learning transferable features with deep adaptation networks[C]//Proceedings of the 32nd International Conference on Machine Learning.Lille:Journal of Machine Learning Research,2015:97‒105.

[31]	Ganin Y, Ustinova E, Ajakan H,et al.Domain-adversarial training of neural networks[M]//Domain Adaptation in Computer Vision Applications.Cham:Springer International Publishing,2017:189‒209. doi:10.1007/978-3-319-58347-1_10

[32]	Zhu Yongchun, Zhuang Fuzhen, Wang Deqing.Aligning domain-specific distribution and classifier for cross-domain classification from multiple sources[J].Proceedings of the AAAI Conference on Artificial Intelligence,2019,33(1):5989‒5996. doi:10.1609/aaai.v33i01.33015989