Deep Transfer Network with Multi-Space Dynamic Distribution Adaptation for Bearing Fault Diagnosis
Abstract
:1. Introduction
- Most feature representations in JDA methods fail to adequately contain information about the fault types. They often ignore some fine-grained information, such as the size and location of the fault, etc. Although some methods take into account fine-grained information, most usually face problems of slow and difficult convergence [14,15], because the adversarial training process is complicated with two stages;
- Since the joint distribution cannot be directly measured, most JDA methods calculate the sum of the marginal distribution distance and the conditional distribution distance, which is approximate to the joint distribution distance. Due to the approximation, accurate feature representation is critical in cross-domain scenarios. On one hand, when the bearing signals are collected from widely varying working conditions, the overall feature representations that represent marginal distribution are more important. On the other hand, when from similar conditions, feature representations of conditional distribution specific to each fault class are more critical. However, existing JDA methods have rarely investigated feature representation under various working conditions.
- A marginal feature extraction module and a multi-space conditional feature extraction module are proposed to guide powerful feature representation of fault information. Based on these modules, the mapping effect of features at multiple scales is achieved;
- A multi-kernel maximum mean discrepancy (MK-MMD) and a local maximum mean discrepancy (LMMD) are introduced as metrics to adjust the marginal distribution and the conditional distribution, respectively. By optimizing the objectives together, the distribution discrepancies within the extracted features can be reduced;
- An adaptive coefficient is designed to dynamically measure the alignment proportion of feature representations. It reweights the fault feature representations by the construction of two domain discriminators, improving the generalization performance in complex cross-domain scenarios.
2. Research Methods
2.1. Proposed Framework
2.1.1. Feature Extraction Module
2.1.2. Dynamic Distribution Adaptation Module
2.2. Training Process
3. Experimental Verification
3.1. Experimental Dataset Description
3.2. Training Details
3.3. Compared Methods Description
3.3.1. Domain Adaptation Based on Statistical Distance Metrics
3.3.2. Domain Adaptation Based on Adversarial Learning
3.4. Results and Analysis of Comparative Experiments
3.4.1. Classification Accuracy
- Comparison with Resnet:
- 2.
- Comparison with MRAN:
- 3.
- Comparison with DDAN:
- 4.
- Comparison with JAN:
- 5.
- Comparison with CDAN:
3.4.2. Accuracy Curves
3.4.3. Feature Visualization
- The proposed DCMSDA model could obtain small intra-class distances and large inter-class distances, which suggests that our method has a strong fault classification capability. Specifically, as can be seen in Figure 7a, features in the DDAN method were somewhat jumbled with a poor gathering effect. Moreover, the category boundaries were not distinctly defined, which means that it was more difficult for the fault classifier to separate these features [31]. From Figure 7b,c, the JAN and CDAN methods incorrectly clustered the Source_IR12 fault and the Target_OR0 fault together, but the proposed method successfully separated them. From Figure 7d, the proposed method could achieve a better convergence effect of faults in the same category and obtain more obvious category boundaries;
- The proposed DCMSDA model could extract representative domain-invariant features and exhibited excellent generalization performance because features of the source and target domains at the same fault category were closest. Specifically, as can be seen in Figure 7a–c, the three compared methods all closed the source domain and the target domain features of OR0, OR2 and IR12 faults unsuccessfully, but the proposed method closed them successfully, as shown in Figure 7d.
3.4.4. Confusion Matrix
3.4.5. Receiver Operating Characteristics (RoC) Curves and Area under Curve (AuC) Values
3.5. Results and Analysis of Ablation Experiments
- Comparing cases 1 and 2 with the proposed model, two feature extraction modules focused on extracting informative features of vibration signals. The marginal feature extraction module could extract marginal features and obtain the general fault information. The multi-space conditional feature extraction module included convolution kernels of different depths and sizes, which could extract richer conditional features and obtain the information on fault categories, thereby guiding a more accurate result;
- Comparing cases 3 and 4 with the proposed model, we adopted two different metrics to measure the distribution discrepancies, which contributed to exerting most of their respective strengths and guided the feature extraction modules to extract more diagnosis knowledge. MK-MMD focused on the global distribution and was suitable for aligning marginal features. LMMD was concerned with the relationship between two sub-domains within the same category, and was suitable for aligning conditional features;
- Comparing case 5 with the proposed model, the adaptive coefficient dynamically measured the relative importance of marginal and conditional distribution alignments, thereby helping the model to adapt to complex cross-domain scenarios;
- Comparing case 6 with the proposed model, the fault diagnosis model with domain adaptation aligned the distributions of domains, which significantly improved the robustness under the cross-condition diagnosis tasks;
- Among them, experimental cases 3 and 5 with domain adaptation showed the largest reduction in accuracy compared with the proposed model. The results indicated that the strategy of adopting two metrics and the adaptive coefficient contributed the most to improving diagnostic accuracy.
4. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Lu, W.; Liang, B.; Cheng, Y.; Meng, D.; Yang, J.; Zhang, T. Deep Model Based Domain Adaptation for Fault Diagnosis. IEEE Trans. Ind. Electron. 2016, 64, 2296–2305. [Google Scholar] [CrossRef]
- Luo, Z.; Wang, J.; Tang, R.; Wang, D. Research on vibration performance of the nonlinear combined support-flexible rotor system. Nonlinear Dyn. 2019, 98, 113–128. [Google Scholar] [CrossRef]
- Jia, S.; Wang, J.; Zhang, X.; Han, B. A Weighted Subdomain Adaptation Network for Partial Transfer Fault Diagnosis of Rotating Machinery. Entropy 2021, 23, 424. [Google Scholar] [CrossRef] [PubMed]
- Dong, Z.; Ji, X.; Zhou, G.; Gao, M.; Qi, D. Multimodal Neuromorphic Sensory-Processing System with Memristor Circuits for Smart Home Applications. IEEE Trans. Ind. Appl. 2022. [Google Scholar] [CrossRef]
- Zhao, Z.; Zhang, Q.; Yu, X.; Sun, C.; Wang, S.; Yan, R.; Chen, X. Applications of Unsupervised Deep Transfer Learning to Intelligent Fault Diagnosis: A Survey and Comparative Study. IEEE Trans. Instrum. Meas. 2021, 70, 3525828. [Google Scholar] [CrossRef]
- Zhiyi, H.; Haidong, S.; Xiang, Z.; Yu, Y.; He, Z. An intelligent fault diagnosis method for rotor-bearing system using small labeled infrared thermal images and enhanced CNN transferred from CAE. Adv. Eng. Inform. 2020, 46, 101150. [Google Scholar] [CrossRef]
- Wang, Y.; Han, M.; Liu, W. Rolling bearing fault diagnosis method based on stacked denoising autoencoder and convolutional neural network. In Proceedings of the 2019 International Conference on Quality, Reliability, Risk, Maintenance, and Safety Engineering (QR2MSE), Zhangjiajie, China, 6–9 August 2019; pp. 833–838. [Google Scholar]
- Shao, H.; Jiang, H.; Zhang, H.; Liang, T. Electric Locomotive Bearing Fault Diagnosis Using a Novel Convolutional Deep Belief Network. IEEE Trans. Ind. Electron. 2017, 65, 2727–2736. [Google Scholar] [CrossRef]
- Yan, R.; Shen, F.; Sun, C.; Chen, X. Knowledge Transfer for Rotary Machine Fault Diagnosis. IEEE Sens. J. 2019, 20, 8374–8393. [Google Scholar] [CrossRef]
- Othman, E.; Bazi, Y.; Melgani, F.; Alhichri, H.; Alajlan, N.; Zuair, M. Domain Adaptation Network for Cross-Scene Classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 4441–4456. [Google Scholar] [CrossRef]
- An, J.; Ai, P. Deep Domain Adaptation Model for Bearing Fault Diagnosis with Riemann Metric Correlation Alignment. Math. Probl. Eng. 2020, 2020, 4302184. [Google Scholar] [CrossRef]
- Zhu, J.; Chen, N.; Shen, C. A New Deep Transfer Learning Method for Bearing Fault Diagnosis Under Different Working Conditions. IEEE Sens. J. 2019, 20, 8394–8402. [Google Scholar] [CrossRef]
- Wan, L.; Li, Y.; Chen, K.; Gong, K.; Li, C. A novel deep convolution multi-adversarial domain adaptation model for rolling bearing fault diagnosis. Measurement 2022, 191, 110752. [Google Scholar] [CrossRef]
- Zhu, Y.; Zhuang, F.; Wang, J.; Ke, G.; Chen, J.; Bian, J.; Xiong, H.; He, Q. Deep subdomain adaptation network for image classification. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 1713–1722. [Google Scholar] [CrossRef]
- Hong, Y.; Hwang, U.; Yoo, J.; Yoon, S. How generative adversarial networks and their variants work: An overview. ACM Comput. Surv. (CSUR) 2019, 52, 1–43. [Google Scholar] [CrossRef]
- Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? Adv. Neural Inf. Process. Syst. 2014, 27, 3320–3328. [Google Scholar]
- Ji, X.; Dong, Z.; Lai, C.S.; Qi, D. A Brain-Inspired In-Memory Computing System for Neuronal Communication via Memristive Circuits. IEEE Commun. Mag. 2022, 60, 100–106. [Google Scholar] [CrossRef]
- Gretton, A.; Borgwardt, K.; Rasch, M.; Schölkopf, B.; Smola, A. A kernel method for the two-sample-problem. Adv. Neural Inf. Process. Syst. 2006, 19, 513–520. [Google Scholar]
- Long, M.; Wang, J.; Cao, Y.; Sun, J.; Yu, P.S. Deep Learning of Transferable Representation for Scalable Domain Adaptation. IEEE Trans. Knowl. Data Eng. 2016, 28, 2027–2040. [Google Scholar] [CrossRef]
- Lessmeier, C.; Kimotho, J.K.; Zimmer, D.; Sextro, W. Condition monitoring of bearing damage in electromechanical drive systems by using motor current signals of electric motors: A benchmark data set for data-driven classification. In Proceedings of the PHM Society European Conference, Bilbao, Spain, 5–8 July 2016; Volume 3. [Google Scholar]
- Karpat, F.; Kalay, O.C.; Dirik, A.E.; Doğan, O.; Korcuklu, B.; Yüce, C. Convolutional neural networks based rolling bearing fault classification under variable operating conditions. In Proceedings of the 2021 International Conference on INnovations in Intelligent SysTems and Applications (INISTA), Kocaeli, Turkey, 25–27 August 2021; pp. 1–6. [Google Scholar]
- Dong, Z.; Qi, D.; He, Y.; Xu, Z.; Hu, X.; Duan, S. Easily Cascaded Memristor-CMOS Hybrid Circuit for High-Efficiency Boolean Logic Implementation. Int. J. Bifurc. Chaos 2018, 28, 1850149. [Google Scholar] [CrossRef]
- Sun, B.; Saenko, K. Deep coral: Correlation alignment for deep domain adaptation. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2016; pp. 443–450. [Google Scholar]
- Borgwardt, K.M.; Gretton, A.; Rasch, M.J.; Kriegel, H.-P.; Schölkopf, B.; Smola, A.J. Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 2006, 22, e49–e57. [Google Scholar] [CrossRef]
- Sejdinovic, D.; Sriperumbudur, B.; Gretton, A.; Fukumizu, K. Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann. Stat. 2013, 41, 2263–2291. [Google Scholar] [CrossRef]
- Long, M.; Zhu, H.; Wang, J.; Jordan, M.I. Deep transfer learning with joint adaptation networks. In Proceedings of the 34th International Conference on Machine Learning, PMLR 2017, Sydney, Australia, 6–11 August 2017; pp. 2208–2217. [Google Scholar]
- Ganin, Y.; Ustinova, E.; Ajakan, H.; Germain, P.; Larochelle, H.; Laviolette, F.; Marchand, M.; Lempitsky, V. Domain-Adversarial Training of Neural Networks. J. Mach. Learn. Res. 2016, 17, 2030–2096. [Google Scholar]
- Long, M.; Cao, Z.; Wang, J.; Jordan, M.I. Conditional adversarial domain adaptation. Adv. Neural Inf. Process. Syst. 2018, 31, 1640–1650. [Google Scholar]
- Zhu, Y.; Zhuang, F.; Wang, J.; Chen, J.; Shi, Z.; Wu, W.; He, Q. Multi-representation adaptation network for cross-domain image classification. Neural Networks 2019, 119, 214–221. [Google Scholar] [CrossRef] [PubMed]
- Wang, J.; Chen, Y.; Feng, W.; Yu, H.; Huang, M.; Yang, Q. Transfer learning with dynamic distribution adaptation. ACM Trans. Intell. Syst. Technol. (TIST) 2020, 11, 1–25. [Google Scholar] [CrossRef]
- Naseer, S.; Hussain, W.; Khan, Y.D.; Rasool, N. Optimization of serine phosphorylation prediction in proteins by comparing human engineered features and deep representations. Anal. Biochem. 2020, 615, 114069. [Google Scholar] [CrossRef] [PubMed]
Layer | Kernel Size | Kernel Number | Strides | Output Shape |
---|---|---|---|---|
CNN Block1 | (1, 1) | 64 | 1 | (32, 64) |
Average pooling | (7, 1) | 64 | 1 | (26, 64) |
CNN Block2 | (1, 1) | 48 | 1 | (32, 48) |
CNN Block3 | (5, 1) | 64 | 1 | (32, 64) |
Average pooling | (7, 1) | 64 | 1 | (26, 64) |
CNN Block4 | (1, 1) | 64 | 1 | (32, 64) |
CNN Block5 | (3, 1) | 96 | 1 | (32, 96) |
CNN Block6 | (3, 1) | 96 | 1 | (32, 96) |
Average pooling | (7, 1) | 96 | 1 | (26, 96) |
Layer | Output Shape |
---|---|
F1: Fully connected marginal domain discriminator layer | (Batch_size, 1024) |
F2: Fully connected marginal domain discriminator layer | (Batch_size, 1024) |
F3: Fully connected marginal domain discriminator layer with one sigmoid | (Batch_size, 1) |
F4: Fully connected conditional domain discriminator layer | (Batch_size, 1024) |
F5: Fully connected conditional domain discriminator layer | (Batch_size, 1024) |
F6: Fully connected conditional domain discriminator layer with one sigmoid | (Batch_size, 1) |
Task Code | 0 | 1 | 2 | 3 |
---|---|---|---|---|
Load torque (Nm) | 0.7 | 0.7 | 0.1 | 0.7 |
Radial force (N) | 1000 | 1000 | 1000 | 400 |
Speed (rpm) | 1500 | 900 | 1500 | 1500 |
Parameter | Value | Parameter | Value |
---|---|---|---|
Epochs | 300 | Sample length | 1024 |
Batch size | 64 | Marginal feature dimension | 512 |
Weight decay | 0.00001 | Fused conditional feature dimension | 256 |
Learning rate | 0.001 | - | - |
Method | Transfer Module | Adaptive Coefficient |
---|---|---|
Resnet | No transfer | No |
DAN | MK-MMD | No |
JAN | JMMD | No |
DANN | Adversarial | No |
CDAN | Condition-adversarial | No |
MRAN | Multi-space | No |
DDAN | MK-MMD and LMMD | Yes |
Proposed | MK-MMD, LMMD and Multi-space | Yes |
Transfer Task | Resnet | DAN | JAN | DANN | CDAN | MRAN | DDAN | Proposed |
---|---|---|---|---|---|---|---|---|
0→1 | 24.27 | 53.13 | 62.92 | 68.93 | 68.18 | 60.25 | 58.16 | 71.35 |
0→2 | 92.83 | 94.37 | 94.03 | 93.23 | 94.92 | 93.54 | 93.56 | 95.51 |
0→3 | 52.12 | 78.14 | 83.31 | 80.12 | 85.02 | 81.87 | 82.48 | 85.08 |
1→0 | 41.13 | 57.65 | 56.50 | 60.12 | 59.83 | 56.56 | 63.32 | 73.63 |
1→2 | 45.28 | 65.57 | 69.89 | 67.34 | 66.75 | 68.89 | 67.18 | 69.33 |
1→3 | 22.74 | 37.10 | 38.24 | 43.40 | 45.22 | 37.25 | 39.61 | 46.49 |
2→0 | 90.96 | 92.03 | 94.47 | 92.60 | 95.05 | 90.33 | 90.63 | 96.33 |
2→1 | 30.35 | 57.79 | 65.34 | 68.08 | 68.31 | 64.56 | 57.91 | 68.66 |
2→3 | 59.01 | 83.99 | 88.39 | 89.82 | 88.93 | 87.23 | 86.84 | 90.21 |
3→0 | 52.07 | 81.18 | 83.68 | 81.81 | 83.01 | 81.37 | 79.66 | 83.06 |
3→1 | 34.52 | 47.00 | 44.66 | 50.09 | 43.05 | 47.87 | 51.29 | 56.76 |
3→2 | 58.42 | 85.33 | 87.10 | 86.54 | 87.19 | 85.33 | 85.43 | 88.77 |
Average | 50.31 | 69.44 | 72.38 | 73.51 | 73.79 | 71.25 | 71.34 | 77.10 |
Case | Transfer Module | Adaptive Coefficient | Test Accuracy (%) |
---|---|---|---|
Case 1 | MK-MMD, LMMD and marginal feature extraction module | Yes | 55.21 |
Case 2 | MK-MMD, LMMD and conditional feature extraction module | Yes | 52.76 |
Case 3 | MK-MMD and two feature extraction modules | Yes | 39.26 |
Case 4 | LMMD and two feature extraction modules | Yes | 56.44 |
Case 5 | MK-MMD, LMMD and two feature extraction modules | No | 50.46 |
Case 6 | Two feature extraction modules | No | 30.54 |
Proposed | MK-MMD, LMMD and two feature extraction modules | Yes | 57.98 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zheng, X.; Gu, Z.; Liu, C.; Jiang, J.; He, Z.; Gao, M. Deep Transfer Network with Multi-Space Dynamic Distribution Adaptation for Bearing Fault Diagnosis. Entropy 2022, 24, 1122. https://doi.org/10.3390/e24081122
Zheng X, Gu Z, Liu C, Jiang J, He Z, Gao M. Deep Transfer Network with Multi-Space Dynamic Distribution Adaptation for Bearing Fault Diagnosis. Entropy. 2022; 24(8):1122. https://doi.org/10.3390/e24081122
Chicago/Turabian StyleZheng, Xiaorong, Zhaojian Gu, Caiming Liu, Jiahao Jiang, Zhiwei He, and Mingyu Gao. 2022. "Deep Transfer Network with Multi-Space Dynamic Distribution Adaptation for Bearing Fault Diagnosis" Entropy 24, no. 8: 1122. https://doi.org/10.3390/e24081122
APA StyleZheng, X., Gu, Z., Liu, C., Jiang, J., He, Z., & Gao, M. (2022). Deep Transfer Network with Multi-Space Dynamic Distribution Adaptation for Bearing Fault Diagnosis. Entropy, 24(8), 1122. https://doi.org/10.3390/e24081122