Semi-Supervised Bearing Fault Diagnosis and Classification Using Variational Autoencoder-Based Deep Generative Models

6476 IEEE SENSORS JOURNAL, VOL. 21, NO.
5, MARCH 1, 2021
Semi-Supervised Bearing Fault Diagnosis

and Classification Using Variational
Autoencoder-Based Deep Generative Models
Shen Zhang , Member, IEEE, Fei Ye , Member, IEEE, Bingnan Wang, Senior Member, IEEE,
and Thomas G. Habetler , Fellow, IEEE
Abstract —Many industries are evaluating the use of the

Internet of Things (IoT) technology to perform remote mon-
itoring and predictive maintenance on their mission-critical
assets and equipment, for which mechanical bearings are
their indispensable components. Although many data-driven
methods have been applied to bearing fault diagnosis, most
of them belong to the supervised learning paradigm that
requires a large amount of labeled training data to be col-
lected in advance. In practical applications, however, obtain-
ing labeled data that accurately reflect real-time bearing
conditions can be more challenging than collecting large
amounts of unlabeled data. In this paper, we thus propose a
semi-supervised learning scheme for bearing fault diagnosis
using variational autoencoder (VAE)-based deep generative models, which can effectively leverage a dataset when
only a small subset of data have labels. Finally, a series of experiments were conducted using the University of
Cincinnati Intelligent Maintenance System (IMS) Center dataset and the Case Western Reserve University (CWRU) bearing
dataset. The experimental results demonstrate that the proposed semi-supervised learning schemes outperformed some
mainstream supervised and semi-supervised benchmarks with the same percentage of labeled data samples. Additionally,
the proposed methods can mitigate the label inaccuracy issue when identifying naturally-evolved bearing defects.
Index Terms — Bearing fault, generative model, semi-supervised learning, variational autoencoders.
I. I NTRODUCTION mission-critical applications. In particular, a key functional

component of many assets and equipment is the mechanical
T HE Internet of Things (IoT) is a system that connects
many devices together and transfers their data over a
network [1]. By connecting these devices, such as simple
bearing, which is responsible for a variety of applications,
including planes, vehicles, production machinery, wind tur-
bines, air-conditioning systems, elevator hoists, among others.
sensors, smartphones, and wearables to automated systems,
These IoT-based bearing diagnosis tasks typically require
it is possible to gather information, analyse it, and take
collecting a large amount of data from their intercon-
appropriate actions to learn from a process or fulfill a specific
nected sensors, such as the frequently-used vibration [3], [4],
task. According to [2], companies in many industries are
acoustic [5], [6], and motor current [7], [8] sensors. These
evaluating the potential of deploying IoT technologies to per-
signals are typically rich in high-dimensional features related
form remote monitoring and predictive maintenance on their
to bearing defects, which makes it well-suited to leverage
Manuscript received October 28, 2020; revised November 14, 2020; deep learning algorithms to extract these fault features and
accepted November 19, 2020. Date of publication November 25, 2020; thereafter perform anomaly detection [9]–[11]. Despite their
date of current version February 5, 2021. The associate editor coor- success, most of the existing models are developed in the
dinating the review of this article and approving it for publication was
Prof. Ruqiang Yan. (Corresponding author: Shen Zhang.) form of supervised learning, which requires a large set of
Shen Zhang and Thomas G. Habetler are with the Department labeled data collected in advance for each distinct operating
of Electrical and Computer Engineering, Georgia Institute of Tech- condition.
nology, Atlanta, GA 30332 USA (e-mail: [email protected];
[email protected]). Since data is typically collected by means of sensors
Fei Ye is with the California PATH, University of California at Berkeley, without human intervention, it might not be difficult to
Berkeley, CA 94720 USA (e-mail: [email protected]). obtain a sufficient amount of data for supervised bearing
Bingnan Wang is with the Mitsubishi Electric Research Laboratories,
Cambridge, MA 02139 USA (e-mail: [email protected]). fault diagnosis [12]. However, the process of labeling the
Digital Object Identifier 10.1109/JSEN.2020.3040696 collected samples can be both time-consuming [13], [14] and
1558-1748 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Bibliothèque ÉTS. Downloaded on August 23,2023 at 22:46:08 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: SEMI-SUPERVISED BEARING FAULT DIAGNOSIS AND CLASSIFICATION 6477
expensive [13]–[18], and it often requires human knowl- model itself as a classifier by also exploiting its generative
edge/expertise on the system states [12]. Therefore, the bearing capabilities.
dataset, especially the faulty data, are usually not labeled We summarize the detailed technical contributions of this
in real industrial applications [14], [19]. Even attempts are work as follows:
made to label these unlabeled samples, the accuracy of these 1) Semi-supervised deep generative model implementation:
labels cannot be guaranteed, since they are also subject to This paper applies two semi-supervised VAE-based deep
confirmational data biases of the engineers interpreting the generative models to leverage properties of both the
data [17]. Therefore, both label scarcity and label accuracy labeled and unlabeled data for bearing fault diagnosis.
issues will pose challenges to the mainstream supervised To mitigate the “KL vanishing” problem in VAE models
learning approaches for bearing fault diagnosis. and further promote the accuracy and robustness of the
A promising approach to overcome these challenges is to semi-supervised classifier, this study also adapt the KL
apply semi-supervised learning algorithms that can leverage cost annealing techniques [25], [26] on top of the original
the limited labeled data and the massive unlabeled data models presented in [27].
simultaneously [12]–[19]. Specifically, semi-supervised learn- 2) Strong performance mitigating the label scarcity issue:
ing considers the classification problem when only a small part This work utilizes the CWRU dataset to create test sce-
of the data has labels, and so far only a few semi-supervised narios where only a small subset of data for each fault cat-
learning paradigms have been applied to bearing fault diagno- egory has labels, which corresponds to the label scarcity
sis. For instance, the support vector data description method issue discussed in [12]–[19] for real-world applications.
in [19] uses cyclic spectral coherent domain indicators to The results show that the M2 model can greatly outper-
construct a feature space and fit a hypersphere, which then cal- form the baseline unsupervised and supervised learning
culates the Euclidean distance in order to distinguish the faulty algorithms. Additionally, the VAE-based semi-supervised
data from the healthy ones. In addition, both [15] and [16] use generative M2 model also compares favorably against
graph-based methods to construct graphs connecting similar four state-of-the-art semi-supervised learning methods.
samples in the dataset, so class labels can be propagated from 3) Solid performance mitigating the label accuracy
labeled nodes to unlabeled nodes through the graph. However, issue: This study also uses the IMS dataset with
these methods are very sensitive to their graph structure and naturally-evolved bearing defects to create test scenarios
need to analyze the graph’s Laplacian matrix, which limits with the label accuracy issue discussed in [17]. The
the scope of these methods. [12] uses α-shape instead of a results demonstrate that incorrect labeling will inevitably
graph-based method to capture the data structure, and the reduce the classifier performance of supervised learning
α-shape is mainly used to perform surface estimation and to algorithms, while adopting semi-supervised deep
reduce the efforts required for parameter tuning. generative models can be an effective way to mitigate
Moreover, the semi-supervised deep ladder network is also the label accuracy issue. This conclusion can be
applied in [13] to identify the failure of the primary parallel supported by the consistent dominance of the proposed
shaft helical gear in an induction motor system. The ladder net- model over CNN when a lot of healthy data were
work is implemented by modeling hierarchical latent variables mislabeled as faulty ones.
to integrate supervised and unsupervised learning strategies. The rest of the paper is organized as follows. In Section II,
However, the unsupervised components of the ladder network we introduce some of the background knowledge of VAE.
may not contribute to a semi-supervised task if those raw data Next, in Section III, we present the architecture of two
do not show obvious clustering on the 2-D manifold, which VAE-based deep generative models in the semi-supervised
is usually the case for vibration signals. Although GAN has setting, with detailed discussions on leveraging a dataset that
also been used for semi-supervised learning in [14], [17], includes both labeled and unlabeled data. In Section IV, two
[18], it is reported in [21] that good generators and good comparative studies of the proposed models against other
semi-supervised classifiers cannot be obtained simultaneously. popular machine learning and deep learning algorithms are
Additionally, the well-known difficulty in training GANs has performed using both the University of Cincinnati’s Cen-
further impacted their applications in semi-supervised learning ter for Intelligent Maintenance Systems (IMS) dataset [28]
tasks in practice [10]. and the Case Western Reserve University (CWRU) bearing
The motivation of the proposed research is both broad dataset [29]. Section V concludes the paper by highlighting
and specific, as we strive to tackle the label scarcity and its technical contributions.
label accuracy issues in bearing fault diagnosis by leverag-
ing both labeled and unlabeled data. Specifically, we adopt
a deep generative model based on solid Bayesian theory II. B ACKGROUND OF VARIATIONAL AUTOENCODERS
and use scalable variational inference in a semi-supervised The variational inference technique is often used in the
environment. Although some existing work using variational training and prediction process, which is effective for solv-
autoencoders (VAE) for bearing fault diagnosis can be found ing the posterior of the distribution obtained from neural
in [22]–[24], they only use the discriminative features in networks [20]. As demonstrated in Fig. 1, the VAE’s archi-
the latent space for dimension reduction, and then use these tecture specifies a joint distribution pθ (x, z) = pθ (x|z) p(z)
features to train other external classifiers. In this work, how- over observations x and latent variables z, which are usually
ever, we also take an integrated approach to train the VAE sampled from a prior density p(z) subject to a multivariate
6478 IEEE SENSORS JOURNAL, VOL. 21, NO. 5, MARCH 1, 2021
Fig. 1. Architecture of variational autoencoders [20].

Fig. 2. Illustration of the latent-feature discriminative M1 model.
Gaussian distribution N (0, I). These latent variables are also this optimization can be performed using stochastic gradient
related to the observed variables x through the likelihood descent.
pθ (x|z), which can be regarded as a probabilistic decoder,
or generator, to decode z into a distribution over the observa- III. S EMI -S UPERVISED D EEP G ENERATIVE M ODELS
tion x. A neural network parameterized by θ is typically used B ASED ON VARIATIONAL AUTOENCODERS
to model the decoder.
This section presents two semi-supervised deep generative
After specifying the decoding process, it is necessary to
models based on VAE [27]. When only a small subset of
perform inference, or to calculate the posterior pθ (z|x) of
training data have labels, both models can exploit VAE’s
latent variables z given the observations x. In addition, we also
generative power to enhance the classifier’s performance.
seek to optimize the model parameters θ with respect to
By learning a good variational approximation of the posterior,
pθ (x), which is obtained by marginalizing out the latent
the VAE’s encoder can embed the input data x as a set of
variables z in the likelihood function pθ (x, z). Since the prior
low-dimension latent features z. The approximated posterior
p(z) is a Gaussian non-conjugate process, the true posterior
qφ (z|x) is formed by a nonlinear transformations, which can be
pθ (z|x) becomes analytically intractable. Therefore, the tech-
modeled as a deep neural network f (z; x, φ) with variational
nique of variational inference should be used to approximate
parameters φ. Similarly, the VAE’s generator takes a set of
a posterior qφ (z|x) with optimized variational parameters φ,
latent variables z and reproduces the observations x using
which minimizes the Kullback-Leibler (KL) divergence of the
pθ (x|z), which can also be modeled as a deep neural network
approximated posterior to the true posterior. This posterior
g(x; z, θ ) parameterized by θ .
approximation qφ (z|x) can also be observed as an encoder
with distribution N (z|μφ (x), diag(σφ2 (x))), of which μφ (x)
and σφ (x) will also be optimized using neural networks. A. Latent-Feature Discriminative M1 Model
By definition, the KL divergence measures the similarity The M1 model [27] trains the VAE-based encoder and
between two distributions, which is expressed as an expecta- decoder in an unsupervised manner. The trained encoder will
tion of the log of the first distribution minus the log of the sec- provide an embedding of input data x in the latent space,
ond distribution. Thus the KL divergence of the approximated which is defined by the latent variables z. In most cases,
posterior qφ (z|x) with respect to the true posterior pθ (z|x) the dimension of z is much smaller than that of x, and these
is shown Eqn. (1), shown at the bottom of the page, after low-dimensional features can often increase the accuracy of
applying the Bayes’ theorem. supervised learning models.
After moving log pθ (x) to the left hand side of Eqn. (1), As shown in Fig. 2, after training the M1 model, the actual
it can be written as the sum of a defined term known as classification task will be carried out in an external classifier,
the evidence lower bound (ELBO) and the KL divergence, such as support vector machine (SVM), polynomial regression,
which satisfies DKL qφ (z|x) pθ (z|x) ≥ 0. Specifically, based etc. Specifically, the VAE encoder will only process the labeled
on Jensen’s inequality, the optimal qφ (z|x) that maximizes the data xl to determine their corresponding latent variable zl ,
ELBO is pθ (z|x), which also makes the KL divergence term then they are combined with their corresponding labels yl to
equal to zero. Therefore, maximizing Eqn. (2), shown at the train this external classifier. The M1 model is considered a
bottom of the page with respect to θ and the variational para- semi-supervised method, since it leverages all available data to
meters φ is analogous to minimizing the KL divergence, and train the VAE-based encoder and decoder in an unsupervised

DKL qφ (z|x) pθ (z|x) = Ez∼qφ (z|x) log qφ (z|x) − log p(z|x)
= Ez∼qφ (z|x) [log qφ (z|x) − log p(z) − log pθ (x|z)] + log pθ (x) (1)

log pθ (x) = −Ez∼qφ (z|x) log qφ (z|x) − log p(z) − log pθ (x|z) + DKL qφ (z|x) pθ (z|x)

= Ez∼qφ (z|x) log pθ (z|x) + log pθ (x|z) − log qφ (z|x) + DKL qφ (z|x) pθ (z|x) (2)

Evidence Lower Bound (ELBO) ≥0
manner, and thereafter it also takes the labeled data (zl , yl ) 3) Combined Objective for the M2 Model: In Eqn. (5), the dis-
to train an external classifier in a supervised fashion. When tribution qφ (y|x), which is used to construct the discriminative
compared with purely supervised learning methods that can classifier, is only included in the variational objective of the
only be trained using a small subset of data with labels, unlabeled data. This is still an undesirable feature, since the
the M1 model usually promotes more accurate classification. labeled data will not be involved in learning this distribution or
This is because the VAE structure is also able to learn from the the variational parameter φ. Therefore, an additional loss term
vast majority of unlabeled data, which enables the extraction should be superimposed on the combined model objective,
of more representative latent features to train its subsequent such that both the labeled and unlabeled data can contribute
classifier. to the training process. Hence, the final objective of the
semi-supervised deep generative M2 model is:

B. Semi-Supervised Generative M2 Model Jα = U(x) + L(x, y) − α · log qφ (y|x) (7)
x∼ p̃u (x,y)∼ p̃l
As briefly mentioned earlier, the major limitation of the
M1 model is the disjoint nature of its training process, as it in which the hyper-parameter α controls the relative weight
needs to train the VAE network first and thereafter the external between the generative and the discriminative learning. A rule
classifier. Specifically, the initial VAE training phase of the of thumb is to set α to be α = 0.1 · N in all experiments,
M1 model’s VAE-based encoder and decoder is a purely where N is the number of labeled data samples.
unsupervised process and does not involve any scarce labels yl , With this combined objective function, we can integrate a
which is completely separated from the subsequent classifier large number of x as a mini-batch to enhance the stability
training phase that actually takes yl . To address this issue, of training two neural networks used as an encoder and
another semi-supervised deep generative model, referred to a decoder. Finally, we’ll run stochastic gradient descent to
as the M2 model, is also proposed in [27]. The M2 model update the model parameters θ and the variational para-
can handle two situations at the same time: one where the meters φ. The structure of the M2 model is presented in
data have labels, and the other where these labels are not Fig. 3.
available. Therefore, there are also two ways to construct the
approximated posterior q and its variational objective. C. Model Implementations
1) Variational Objective With Unlabeled Data: When labels
1) M1 Model Implementation: The M1 model constructs its
are not available, two separate posteriors qφ (y|x) and qφ (z|x)
encoder qφ (z|x) and decoder pθ (x|z) by using two deep
will be approximated during the VAE training stage, where
neural networks f (z; x, φ) and g(x; z, θ ), respectively. The
z is still the latent variables similar to the M1 model, while
encoder has 2 convolutional layers and 1 fully connected
y is the unobserved label yu . This newly defined posterior
layer using ReLu activation, aided by batch normalization and
approximation qφ (y|x) will be used to construct the best
dropout layers. The decoder consists of 1 fully connected layer
classifier as our inference model [27]. Given the observations
followed by 3 transpose convolutional layers, where the first
x, the two approximated posteriors of the corresponding class
2 layers use ReLU activation and the last layer uses linear
labels y and latent variables z can be defined as
activation.

qφ (y|x) = Cat y|πφ (x) Due to the “KL vanishing” problem, it is often difficult

to achieve a good balance between the likelihood and the KL
qφ (z|x) = N z|μφ (x), diag σφ2 (x) (3) divergence, as the KL loss can be undesirably reduced to zero,
though it is expected to remain a small value. To overcome
where Cat y|πφ (x) is the concatenated multinomial distribu- this problem, the implementation of the M1 model uses the
tion, πφ (x) can be modeled by a neural network parameterized “KL cost annealing” or “β VAE” [25], which includes a new
by φ. Combining the above two posteriors, a joint posterior weight factor β for the KL divergence. The revised ELBO
approximation can be defined as function for “β VAE” is

qφ (y, z|x) = qφ (z|x)qφ (y|x) (4) ELBO = Eqφ (z|x) log pθ (x|z) −β · D K L qφ (z|x) p(z) (8)

Reconstruction KL Regularization
Therefore, the revised ELBOU that determines the vari-
ational objective of the unlabeled data can be written as During training, β will be manipulated to gradually increase
Eqn. (5), shown at the bottom of the next page, where L(x, y) from 0 to 1. When β < 1, the latent variables z are trained with
is the original ELBO in Eqn. (2). an emphasis on capturing useful features for reconstructing the
2) Variational Objective With Labeled Data: Since the goal of observations x. When β = 1, the z learned in earlier epochs
semi-supervised learning is to train a classifier using a limited can be taken as a good initialization, which enables more
amount of labeled data and the vast majority of unlabeled data, informative latent features to be used by the decoder [26].
it would be beneficial to also include the scarce labels in the After training the M1 model that is able to balance its
training process of this deep generative M2 model. Similarly, reconstruction and generation features, the latent variable z
Eqn. (6), shown at the bottom of the next page shows the in latent space will be used as discriminative features for
revised ELBOL that determines the variational objective for the external classifier. This paper uses an SVM classifier,
the labeled data. though any personally preferred classifier can also be used.
Additionally, it also has a built-in classifier to perform

inference on the approximated posterior qφ (y|x). Therefore,
despite the fact that the M2 model tends to have a superior
performance than the M1 model, it also suffers from prolonged
training time due to increased model complexity. Since both
models have their strengths and weaknesses, it is worthwhile to
compare how they perform in the context of semi-supervised
bearing fault diagnosis.
Fig. 3. Illustration of the semi-supervised generative M2 model. IV. E XPERIMENTAL R ESULTS U SING
THE CWRU DATASET
The M1 model will perform discriminative feature extraction In this section, we seek to use the CWRU dataset to verify
and reduce the dimensionality of the input data, which is the effectiveness of the two VAE-based semi-supervised deep
expected to increase the performance of the external classifier. generative models for bearing fault diagnosis. The developed
In this study, the input data has a dimension of 1,024, which diagnostic framework will be described in detail, and the
will be reduced to 128 in the latent space. performance of the classifier will be first compared with
2) M2 Model Implementation: The deep generative three baseline supervised/unsupervised algorithms, includ-
M2 model uses the same structure for qφ (z|x) as the ing principal component analysis (PCA), autoencoder (AE),
M1 model, while the decoder pθ (x|y, z) also has the same and convolutional neural network (CNN). Then, we’ll also
settings as M1’s pθ (x|z). In addition, the classifier qφ (y|x) compare the proposed methods against some state-of-the-art
consists of 2 convolutional layers and 2 max-pooling layers semi-supervised learning algorithms, such as low density
with dropout and ReLU activation, followed by the final separation (LDS) [30], safe semi-supervised support vector
Softmax layer. machine (S4VM) [31], SemiBoost [32], and semi-supervised
Two independent neural networks are used, one for the smooth alpha layering (S3AL) [12].
labeled data and one for the unlabeled data, with the same
network structure, but different input/output specifications and
loss functions. For instance, for labeled data, both xl and y A. CWRU Dataset
are considered as input to minimize the labeled (x, y) ∼ p̃l The CWRU dataset contains vibration signals collected from
part in Eqn. (7), and the output will be the reconstructed the drive-end bearing and fan-end bearing in a 2 hp induction
as xl∗ and y ∗ . For unlabeled data, xu is the only input to motor dyno setup [29]. Single-point defects are manually
reconstruct xu . Other hyper-parameters of the M2 model are created onto the bearing inner race (IR), outer race (OR),
also selected empirically. We use a batch size of 200 for and rolling elements by electro-discharge machining. Different
training, the latent variable z has a dimension of 128. For opti- defect diameters of 7 mil, 14 mil, 21 mil, 28 mil, and 40 mil
mizer settings, we use RMSprop with a 10−4 initial learning are used to represent different levels of fault severity. Two
rate. accelerometers mounted on the drive-end and fan-end of the
3) M1 Vs. M2 Model: By comparing the M1 and M2 models, motor housing are used to collect vibration data at a motor load
it’s obvious to find that the significance of the M1 model from 0 to 3 hp and a motor speed from 1,720 to 1,797 rpm
lies in its simple and clear network structure, which is easy at a sampling frequency of 12 kHz or 48 kHz.
to implement and saves training time. As shown in Fig. 2, The purpose of the proposed bearing fault diagnostic model
the M1 model is a simple and straightforward implementation is to reveal the location and severity of bearing defects,
of VAE that only includes an encoder and a decoder trained in and vibration data collected for the same failure type but at
an unsupervised manner, then the learned latent features and different speeds and load conditions will be considered as
labels (zl , yl ) of the labeled data are subsequently used to train having the same class label. Based on this standard, 10 classes
an external classifier. are specified according to the size and location of the bearing
On the other hand, the M2 model deals with both labeled defect, and TABLE I identifies a detailed list of all 10 classes
and unlabeled data by using two identical encoder networks. featured in this study.

ELBOU = Eqφ (y,z|x) log pθ (x|y, z) + log pθ (y) + log pθ (z) − log qφ (y, z|x)

= Eqφ (y|x) − L(x, y)) − log qφ (y|x)

= qφ (y|x)(−L(x, y)) + H(qφ (y|x)) = −U(x) (5)
y

ELBOL = Eqφ (z|x,y ) log pθ (x|y, z) + log pθ (y) + log pθ (z) − log qφ (z|x, y)

= Eqφ (z|x) log pθ (x|y, z) + log pθ (y) + log pθ (z) − log qφ (z|x) = −L(x, y) (6)
Fig. 4. Comparison of the original (top row) and the reconstructed (bottom row) bearing vibration signals after training the VAE M1 model.
TABLE I carried out to ensure that both training and test set represent
C LASS L ABELS S ELECTED F ROM THE CWRU D ATASET the overall distribution of the CWRU dataset, which can
further enhance model generalization and makes it less prone
to overfitting. Classical standardization techniques are also
implemented to the training and test set to ensure the vibration
data have zero mean and unit variance, which is enabled by
subtracting the mean of the original data and then dividing the
result by its standard deviation.
C. Experimental Results
After training the VAE-based M1 model, the reconstructed
bearing vibration signal should be very similar to the actual
vibration signal, and their comparisons are demonstrated in
Fig. 4. Although a perfect reconstruction may impact the
VAE’s generative capabilities and reduce its versatility, a rea-
B. Data Preprocessing sonably close reconstruction with a small error indicates
The diagnosis process starts from data segmentation, which that VAE has achieved a balance between reconstruction and
divides the collected vibration signal into multiple segments generation, which is critical to leverage the generative features
of equal length. For the CWRU dataset, the number of data of the algorithm.
samples of the drive-end vibration signal for each kind of The network structure of the VAE-based deep genera-
bearing failure is approximately 120,000 at three different tive M1 and M2 models have been discussed in detail in
speeds (i.e., 1,730 rpm, 1,750 rpm, and 1,772 rpm). Data Section III. C. In addition to implementing these models
collected at these speeds constitute the complete data for for bearing fault diagnosis, other popular unsupervised learn-
each class, which will later be segmented using a fixed ing schemes such as PCA and autoencoder, as well as the
window size of 1,024 samples (or 85.3 ms time span for supervised CNN, are also trained to serve as baselines. Their
a sampling rate of 12 kHz) and with a sliding rate of 0.2. parameters are either selected to be consistent with the M1 and
Finally, the numbers of training and test data segments are M2 model, or are obtained through parameter tuning. For
12,900 and 900, respectively. All of the test set data will be example, we use the same optimizer settings as the VAE model
labeled. Although the percentage of test data appears to be (RMSprop with an initial learning rate of 10−4 ) to train both
small at first glance, only a maximum of 2,150 training data CNN and autoencoder benchmarks. More details are provided
segments will have labels in the later experiment, indicating as follows:
the percentage of test data over labeled training data is 1) PCA+SVM: the PCA+SVM benchmark is trained using
around 30%. low-dimensional features extracted from the labeled data
After the initial data import and segmentation stage, these segments (each consists of 1,024 data samples) using
data segments are still arranged in the order of their class PCA. The dimension of the feature space is 128, which
labels, or fault types. Therefore, data shuffling needs to be is consistent with the M1 and M2 model’s latent space
TABLE II
E XPERIMENTAL R ESULTS OF VAE-B ASED S EMI -S UPERVISED C LASSIFICATION ON CWRU B EARING D ATASET W ITH L IMITED L ABELS
TABLE III
C OMPARISON OF D IFFERENT S EMI -S UPERVISED L EARNING A LGORITHMS W ITH D IFFERENT L ABELED D ATA P ERCENTAGE ν
dimension. The SVM uses a radial basis function (RBF) TABLE IV

kernel, and through modest parameter optimization, its D ATA C HARACTERISTICS OF E ACH S CENARIO B ASED ON [12]
regularization parameter is set to C = 10. Additionally,
the kernel coefficient is set to “sample” (1/128/ X.var ()),
where X.var () is the variance of the input data.
2) Autoencoder (AE): Thanks to the structural similarity
with VAE, the AE baseline inherits the same network
structure (encoder-decoder) as the M1 model, as well as
its SVM-based external classifier.
3) CNN: the CNN benchmark treats each time-series vibra-
tion data segment (consisting of 1,024 data samples) classification. It is also worth mentioning that the VAE-based
as a 2-D 32 × 32 image, which is a common practice M1 model has early advantages over CNN until the number
to apply the vanilla CNN on bearing fault diagnosis, of labeled samples is N = 100. Then the performance
as discussed in detail in [33]. Specifically, the CNN is degraded, which contradicts the results obtained on the
baseline has two convolution layers with ReLU activation, classic MNIST dataset in [27]. One explanation for this
each with 2 × 2 convolutions and 32 filters, followed by deviation may be that the CWRU dataset has many explicit
a 2 × 2 max-pooling layer and a 0.25 dropout layer. feature representations, which can be easily captured by some
Next, we deploy a fully connected hidden layer with a powerful supervised learning schemes. Therefore, the CNN
dimensionality of 512, and the output of which is fed only requires 1,000 labeled training data segments to achieve
into a Softmax layer. The cross-entropy loss is adopted, an accuracy over 90%.
and the batch size is set to 10, which is also obtained via However, by integrating the features learned in the
parameter tuning. M1 model and the classification mechanism directly into
A total of 10 rounds of experiments are performed on the same model, the conditional generated M2 model can
the same training and test sets shuffled randomly. Of the obtain better results than the CNN model or the M1 model
129,000 training samples, only a small portion of the labels with an external SVM classifier. Specifically, to achieve a
are actually used in different algorithms to construct bearing fault classification accuracy of around 95%, the M2 model
fault classifiers. Seven case studies were conducted using 50, only needs 4% (516) of the training data segments to be
100, 300, 516, 860, 1,075, and 2,150 labels, indicating 0.39%, labeled, while the best accuracy that can be achieved using
0.78%, 2.34%, 4%, 6.67%, 8.33%, and 16.67% of the training other benchmark algorithms is only 75.41% using the same
data have labels. amount of labeled data. In addition, this significant accuracy
TABLE II lists the average accuracy and standard devia- improvement of around 20% delivered by the M2 model is also
tion of different algorithms after 10 rounds of experiments, very consistent, and its standard deviation is as low as 1.66%.
in which the M1 latent feature discriminant model can deliver Additionally, we also compare the proposed VAE-based
equivalent, if not better performance as the unsupervised M1 and M2 model against four state-of-the-art semi-
model (autoencoder), showcasing that the M1 model’s latent supervised learning methods, including LDS, S4VM, Semi-
space is able to provide robust features to enable good Boost, and S3AL. To make a fair comparison, we used the
TABLE V
B EARING FAULT S CENARIOS AND T HEIR D EGRADATION S TARTING P OINTS IN THE IMS D ATASET [34]
same CWRU dataset, the same data preprocessing methods in

terms of data segmentation (signal length and number) and
labeling (based on defect width and motor speed/load), and
the same labeled data percentage specified in [12]. The fault
classification accuracy and standard deviation obtained using
the CWRU dataset is shown in TABLE III, where ν stands
for the percentage of labeled samples. The results for the four
semi-supervised benchmark studies were summarized in [12],
and their data segmentation and labeling details are presented
in TABLE IV.
It can be observed from Table III that the best-performing
algorithm is the proposed VAE M2 method. Specifically,
Fig. 5. Experimental setup collecting the IMS dataset [28].
the average accuracy of VAE M2 is 1.5%, 7.4%, and 5.6% bet-
ter than the second-best S3AL algorithm, when the percentages distinction when compared to the actual situation where the
of labeled data ν are 5%, 10%, and 20%, respectively. In addi- bearing supports the motor shaft and transmission, since
tion, the standard deviation of VAE M2 is also significantly the test needs to be quickly stopped after sensing abnormal
lower than that of the benchmark semi-supervised learning conditions.
studies, demonstrating the robustness and consistency of the The IMS dataset consists of 3 subsets that were collected
VAE M2 model. Moreover, the VAE M1 model also secures when the motor was running at a constant speed of 2,000 rpm.
third place in this comparison, and its performance is just 1% For subset 1, two high-sensitivity PCB 253B33 Quart ICP
to 2% shy of S3AL when ν is 10% or 20%. Therefore, it can accelerometers were installed to measure bearing vibrations
be concluded that it is not only the adoption of VAE-based in both x and y directions, while subsets 2 and 3 only had one
network structure, but also the integrated training approach accelerometer installed on each bearing. Data samples were
of the M2 model that contributed to the largest performance acquired in 1-second windows and were stored in a separate
enhancement of deep generative models in semi-supervised file every ten minutes. Each file contains 20,480 sampling
bearing fault diagnosis. points, except for the first subset, which collected the first
92 files every five minutes. As mentioned earlier, the IMS
V. E XPERIMENTAL R ESULTS U SING THE IMS DATASET test may continue after the bearing is degraded, and it is
A. IMS Dataset challenging to label when such a bearing degradation actually
The previous section only deploys the CWRU dataset with happened. In [34], three different algorithms are applied to
artificially damaged bearings, which may not fully reveal estimate the degradation starting point, the results demonstrate
the potential of the proposed model in practical applications. a high level of uncertainty as the estimated starting points
Therefore, the IMS bearing dataset, which also contains data can deviate by more than 100 data segments using different
collected from naturally evolved bearing defects, will be methods. A detailed summary of this finding in [34] is listed
employed in this section. The IMS dataset was collected on in TABLE V.
the test stand shown in Fig. 5. Specifically, four double-row
bearings are installed on the same shaft, which is coupled B. Data Preprocessing
to the motor shaft through a rubber band. Each of the four The IMS dataset uses a fixed window size of 1,024 for
bearings tested carries 6,000 lbs of radial load. The test can segmentation and collects data at a frequency of 20.48 kHz.
continue even if these bearings show early implications of Due to the large amount of noise in “Subset 3 Bearing 3”
failure, since they are not used to support the motor or any as reported in [34], we only select the first 3 fault conditions
active rotational motion. In fact, this test will not be terminated in TABLE V to evaluate the performance of semi-supervised
until the accumulation of debris on the electromagnetic plug VAE models with label uncertainty. In addition, 210 consec-
exceeds a certain level of threshold [28]. This is a major utive files are selected for each fault condition. and the last
TABLE VI
E XPERIMENTAL R ESULTS OF S EMI -S UPERVISED C LASSIFICATION ON IMS B EARING D ATASET W ITH L IMITED L ABELS
15 of them are chosen after their degradation starting points Similar to the comparison results shown in TABLE II,
determined by the auto-encoder-correlation (AEC) algorithm. the classifier’s performance of the VAE-based M2 model is
For instance, data files 1,832 to 2,042 will be selected for the superior to the other four algorithms, showing the advantage
“Subset 1 Bearing 3” scenario, since its estimated degradation of integrating the training process of the VAE model and its
starting point is 2,027. On the other hand, healthy data are built-in classifier. Critical observations can be drawn when the
picked from the first 110 files of “Subset 1 Bearing 3”. number of labeled training data increases from N = 4, 000 to
Each fault scenario has 210 consecutive vibration data files, N = 8, 000, the accuracy of the supervised algorithm CNN is
of which the last 10 files will serve as the test set, and the reduced by more than 6%, and the loss of the semi-supervised
first 200 files will constitute the entire training set. Each VAE M2 model is 4%. The performance of unsupervised
file contains 20,480 data points, which can be divided into learning algorithms, which do not use labels in their training
20 data segments. Therefore, each fault scenario has 4,000 data process, remains intact. This can be largely attributed to the
segments, or all 4 categories (healthy, rolling elements, outer many healthy data samples that have been incorrectly labeled
race, inner race) have 16,000 data segments. as faulty data, which also creates a dilemma that impairs
In order to simulate the challenges related to accurate data the classifier’s accuracy using either insufficient data or more
labeling in practical applications, labels will be assigned start- data with inaccurate labels. Specifically, the best attainable
ing from the last of the 40,000 training data segments for each accuracy for three baselines algorithms are 87.74% when
fault scenario and proceed backward. For these data segments, N = 4, 000 and 84% when N = 8, 000, while the VAE-based
we should have the highest confidence in the accuracy of their M2 model can achieve an average of 92.01% and 88.11%,
labels. Then, by labeling more preceding files, but with lower respectively.
confidence, more case studies can be performed. The purpose In summary, the experimental results obtained using the
is to investigate whether incorrect labeling will negatively IMS dataset consistently supports the previous findings on the
impact the accuracy of the supervised learning benchmark – CWRU dataset, that is, taking advantage of the large amount
CNN, and to assess if semi-supervised deep generative models of unlabeled data can effectively enhance the classifier’s per-
can still improve the accuracy of the bearing fault classifier formance using semi-supervised VAE-based deep generative
by leaving these data segments unlabeled. models, especially the M2 model. In addition, the results
also imply that inaccurate labeling can reduce the accuracy
C. Experimental Results
of supervised learning algorithms. Therefore, in diagnosing
A total of 10 rounds of independent semi-supervised exper- naturally evolved bearing faults in real-world applications,
iments were performed using the IMS dataset, and 10 case it is desirable to leverage semi-supervised learning methods,
studies were conducted by labeling the last 40, 100, 200, which only requires a small set of data that we can label with
400, 800, 1,000, 2,000, 4,000, and 8,000 data segments of confidence while retaining the majority of data unlabeled.
the training set, which account for 0.25%, 0.63%, 1.25%,
2.5%, 5%, 6.25%, 12.5%, 25%, and 50% of the training data,
respectively. VI. C ONCLUSION
TABLE VI presents the classification results after 10 rounds This paper implemented two semi-supervised deep gener-
of experiments. The performance of the M1 is better than that ative models based on VAE for bearing fault diagnosis with
of PCA, but it has almost the same performance as the vanilla limited labels. The results on the CWRU dataset show that the
autoencoder. This shows that the VAE’s discriminant feature M2 model can greatly outperform the baseline supervised and
space has no obvious advantage over the vanilla autoen- unsupervised learning algorithms, and this advantage can be up
coder’s encoded space. Nevertheless, the performance of the to 27% when only 2.3% of training data have labels. Addi-
M1 model is also superior to the supervised learning algorithm tionally, the VAE-based M2 model also compares favorably
CNN. By incorporating the vast majority of unlabeled data in against four state-of-the-art semi-supervised learning methods.
the training process, the improvement is approximately 5% to The CWRU dataset only contains vibration data from
15% when the number of labeled data segments varies from manually initiated bearing defects, which is inconsistent with
N = 40 to N = 1, 000, and the standard deviation is also the real-world scenario where these defects are evolved
much smaller. naturally over time. Therefore, we also used the IMS
dataset to verify the performance of the two VAE-based [20] D. P Kingma and M. Welling, “Auto-encoding variational bayes,” 2013,
semi-supervised deep generative models. The results demon- arXiv:1312.6114. [Online]. Available: http://arxiv.org/abs/1312.6114
[21] Z. Dai, Z. Yang, F. Yang, W. W. Cohen, and R. R. Salakhutdinov, “Good
strate that incorrect labeling will reduce the classifier perfor- semi-supervised learning that requires a bad GAN,” in Proc. Adv. Neural
mance of mainstream supervised learning algorithms, while Inf. Process. Syst., 2017, pp. 6510–6520.
adopting semi-supervised deep generative models and keeping [22] G. San Martin, E. López Droguett, V. Meruane, and M. das Chagas
Moura, “Deep variational auto-encoders: A promising tool for dimen-
data with label uncertainties unlabeled can be an effective way sionality reduction and ball bearing elements fault diagnosis,” Struct.
to mitigate this issue. Health Monitor., vol. 18, no. 4, pp. 1092–1128, Jul. 2019.
R EFERENCES [23] A. L. Ellefsen, E. Bjørlykhaug, V. Æsøy, and H. Zhang, “An unsu-
pervised reconstruction-based fault detection algorithm for maritime
[1] M. Burgess. (May 2020). What Is the Internet of Things? Wired components,” IEEE Access, vol. 7, pp. 16101–16109, 2019.
Explains. [Online]. Available: https://www.wired.co.uk/article/internet- [24] M. Hemmer, A. Klausen, H. V. Khang, K. G. Robbersmyr, and
of-things-what-is-explained-io%t T. I. Waag, “Health indicator for low-speed axial bearings using varia-
[2] J. Krakauer. (Apr. 2020). Data in Action: Iot and the Smart Bearing. tional autoencoders,” IEEE Access, vol. 8, pp. 35842–35852, 2020.
[Online]. Available: https://blogs.oracle.com/bigdata/data-in-action-iot- [25] C. P. Burgess et al., “Understanding disentangling in β-VAE,” 2018,
and-the-smart-beari%ng arXiv:1804.03599. [Online]. Available: http://arxiv.org/abs/1804.03599
[3] J. Harmouche, C. Delpha, and D. Diallo, “Improved fault diagnosis of [26] H. Fu, C. Li, X. Liu, J. Gao, A. Celikyilmaz, and L. Carin,
ball bearings based on the global spectrum of vibration signals,” IEEE “Cyclical annealing schedule: A simple approach to mitigat-
Trans. Energy Convers., vol. 30, no. 1, pp. 376–383, Mar. 2015. ing KL vanishing,” 2019, arXiv:1903.10145. [Online]. Available:
[4] F. Immovilli, A. Bellini, R. Rubini, and C. Tassoni, “Diagnosis of http://arxiv.org/abs/1903.10145
bearing faults in induction machines by vibration or current signals: [27] D. P. Kingma, S. Mohamed, D. J. Rezende, and M. Welling, “Semi-
A critical comparison,” IEEE Trans. Ind. Appl., vol. 46, no. 4, supervised learning with deep generative models,” in Proc. Adv. Neural
pp. 1350–1359, Jul. 2010. Inf. Process. Syst., 2014, pp. 3581–3589.
[5] M. Kang, J. Kim, and J.-M. Kim, “An FPGA-based multicore system for [28] J. Lee, H. Qiu, G. Yu, and J. Lin, “Bearing data set,” IMS, Univ. Cincin-
real-time bearing fault diagnosis using ultrasampling rate AE signals,” nati, Rexnord Tech. Services, NASA Ames Prognostics Data Repository,
IEEE Trans. Ind. Electron., vol. 62, no. 4, pp. 2319–2329, Apr. 2015. NASA Ames Res. Center, Moffett Field, CA, USA, 2007. [Online].
[6] A.-B. Ming, W. Zhang, Z.-Y. Qin, and F.-L. Chu, “Dual-impulse Available: http://ti.arc.nasa.gov/project/prognostic-data-repository
response model for the acoustic emission produced by a spall and the [29] (Nov. 2020). Case Western Reserve University (CWRU) Bear-
size evaluation in rolling element bearings,” IEEE Trans. Ind. Electron., ing Data Center. [Online]. Available: https://csegroups.case.edu/
vol. 62, no. 10, pp. 6606–6615, Oct. 2015. bearingdatacenter/pages/project-history
[7] M. Blodt, P. Granjon, B. Raison, and G. Rostaing, “Models for [30] O. Chapelle and A. Zien, “Semi-supervised classification by low density
bearing damage detection in induction motors using stator current separation,” in Proc. Int. Workshop Artif. Intell. Statist. (AISTATS),
monitoring,” IEEE Trans. Ind. Electron., vol. 55, no. 4, pp. 1813–1822, Barbados, Caribbean, Jan. 2005, pp. 57–64.
Apr. 2008. [31] Y.-F. Li and Z.-H. Zhou, “Towards making unlabeled data never hurt,”
[8] S. Zhang et al., “Model-based analysis and quantification of bearing IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 1, pp. 175–188,
faults in induction machines,” IEEE Trans. Ind. Appl., vol. 56, no. 3, Jan. 2015.
pp. 2158–2170, May 2020. [32] P. K. Mallapragada, R. Jin, A. K. Jain, and Y. Liu, “SemiBoost: Boosting
[9] R. Zhao, R. Yan, Z. Chen, K. Mao, P. Wang, and R. X. Gao, “Deep for semi-supervised learning,” IEEE Trans. Pattern Anal. Mach. Intell.,
learning and its applications to machine health monitoring,” Mech. Syst. vol. 31, no. 11, pp. 2000–2014, Nov. 2009.
Signal Process., vol. 115, pp. 213–237, Jan. 2019. [33] L. Wen, X. Li, L. Gao, and Y. Zhang, “A new convolutional neural
[10] S. Zhang, S. Zhang, B. Wang, and T. G. Habetler, “Deep learning network-based data-driven fault diagnosis method,” IEEE Trans. Ind.
algorithms for bearing fault diagnostics—A comprehensive review,” Electron., vol. 65, no. 7, pp. 5990–5998, Jul. 2018.
IEEE Access, vol. 8, pp. 29857–29881, 2020. [34] R. M. Hasani, G. Wang, and R. Grosu, “An automated auto-
[11] S. R. Saufi, Z. Ahmad, M. S. Leong, and M. H. Lim, “Challenges and encoder correlation-based health-monitoring and prognostic method
opportunities of deep learning models for machinery fault detection and for machine bearings,” 2017, arXiv:1703.06272. [Online]. Available:
diagnosis: A review,” IEEE Access, vol. 7, pp. 122644–122662, 2019. http://arxiv.org/abs/1703.06272
[12] R. Razavi-Far, E. Hallaji, M. Farajzadeh-Zanjani, and M. Saif,
“A semi-supervised diagnostic framework based on the surface estima-
tion of faulty distributions,” IEEE Trans. Ind. Informat., vol. 15, no. 3,
Shen Zhang (Member, IEEE) received the B.S.
pp. 1277–1286, Mar. 2019.
(Hons.) degree in electrical engineering from the
[13] R. Razavi-Far et al., “Information fusion and semi-supervised deep
Harbin Institute of Technology, Harbin, China,
learning scheme for diagnosing gear faults in induction machine sys-
in 2014, and the M.S. and Ph.D. degrees in
tems,” IEEE Trans. Ind. Electron., vol. 66, no. 8, pp. 6331–6342,
electrical and computer engineering from the
Aug. 2019.
Georgia Institute of Technology, Atlanta, GA,
[14] P. Liang, C. Deng, J. Wu, Z. Yang, J. Zhu, and Z. Zhang, “Single and
USA, in 2017 and 2019, respectively.
simultaneous fault diagnosis of gearbox via a semi-supervised and high-
His research interests include design, control,
accuracy adversarial learning framework,” Knowl.-Based Syst., vol. 198,
condition monitoring, and fault diagnostics of
Jun. 2020, Art. no. 105895.
electric machines, control of power electronics,
[15] X. Chen, Z. Wang, Z. Zhang, L. Jia, and Y. Qin, “A semi-supervised
powertrain engineering for electric propulsion,
approach to bearing fault diagnosis under variable conditions towards
deep learning, and reinforcement learning applied to energy systems.
imbalanced unlabeled data,” Sensors, vol. 18, no. 7, p. 2097, Jun. 2018.
[16] M. Zhao, B. Li, J. Qi, and Y. Ding, “Semi-supervised classification for
rolling fault diagnosis via robust sparse and low-rank model,” in Proc.
IEEE 15th Int. Conf. Ind. Informat. (INDIN), Jul. 2017, pp. 1062–1067. Fei Ye (Member, IEEE) received the M.S. degree
[17] D. B. Verstraete, E. L. Droguett, V. Meruane, M. Modarres, and from Northeastern University, Boston, MA, USA,
A. Ferrada, “Deep semi-supervised generative adversarial fault diagnos- in 2014, and the Ph.D. degree from the Uni-
tics of rolling element bearings,” Structural Health Monitor., vol. 19, versity of California at Riverside, Riverside, CA,
no. 2, pp. 390–411, Dec. 2019. USA, in 2019, all in electrical and computer
[18] T. Pan, J. Chen, J. Xie, Y. Chang, and Z. Zhou, “Intelligent fault engineering.
identification for industrial automation system via multi-scale convo- She is currently a Postdoctoral Researcher with
lutional generative adversarial network with partially labeled samples,” the University of California at Berkeley, Berkeley,
ISA Trans., vol. 101, pp. 379–389, Jun. 2020. CA, USA. Her research interests include intelli-
[19] C. Liu and K. Gryllias, “A semi-supervised support vector data gent vehicles and transportation, deep learning,
description-based fault detection method for rolling element bearings deep reinforcement learning with domain adap-
based on cyclic spectral analysis,” Mech. Syst. Signal Process., vol. 140, tation and their applications in behavior prediction, decision making, and
Jun. 2020, Art. no. 106682. trajectory planning.
Bingnan Wang (Senior Member, IEEE) Thomas G. Habetler (Fellow, IEEE) received
received the B.S. degree from Fudan University, the B.S.E.E. and M.S. degrees in electrical
Shanghai, China, in 2003, and the Ph.D. degree engineering from Marquette University, Milwau-
from Iowa State University, Ames, IA, USA, kee, WI, USA, in 1981 and 1984, respectively,
in 2009, all in physics. and the Ph.D. degree from the University of
Since then, he has been with the Mitsubishi Wisconsin–Madison in 1989.
Electric Research Laboratories (MERL), From 1983 to 1985, he was employed with
Cambridge, MA, USA, where he is a Senior the Electro-Motive Division of General Motors as
Principal Research Scientist. His research a Project Engineer. Since 1989, he has been
interests include electromagnetics and with the Georgia Institute of Technology, Atlanta,
photonics, and their applications to wireless where he is currently a Professor of Electrical and
communications, wireless power transfer, sensing, electric machines, Computer Engineering. His research interests include electric machine
and energy systems. protection and condition monitoring, switching converter technology, and
drives. He has published over 300 technical articles in the field. He is
also a Regular Consultant to industry in the field of condition-based
diagnostics for electrical systems.
Dr. Habetler was the inaugural recipient of the IEEE-PELS “Diagnostics
Achievement Award,” and a recipient of the EPE-PEMC “Outstanding
Achievement Award,” the 2012 IEEE Power Electronics Society Harry
A. Owen Distinguished Service Award, and the 2012 IEEE Industry
Application Society Gerald Kliman Innovator Award. He received one
transaction and four conference prize paper awards from the Industry
Applications Society. He has served on the IEEE Board of Directors as the
Division II Director and the Technical Activities Board, and the Member
and Geographic Activities Board, as a Director of IEEE-USA. He is also
a Past President of the Power Electronics Society. He has also served as
an Associate Editor for the IEEE TRANSACTIONS ON POWER ELECTRONICS.

Semi-Supervised Bearing Fault Diagnosis and Classification Using Variational Autoencoder-Based Deep Generative Models

Uploaded by

Copyright:

Available Formats

Semi-Supervised Bearing Fault Diagnosis and Classification Using Variational Autoencoder-Based Deep Generative Models

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Semi-Supervised Bearing Fault Diagnosis and Classification Using Variational Autoencoder-Based Deep Generative Models

Uploaded by

Copyright:

Available Formats

6476 IEEE SENSORS JOURNAL, VOL. 21, NO.

Semi-Supervised Bearing Fault Diagnosis

Abstract —Many industries are evaluating the use of the

I. I NTRODUCTION mission-critical applications. In particular, a key functional

Fig. 1. Architecture of variational autoencoders [20].

Additionally, it also has a built-in classifier to perform

dimension. The SVM uses a radial basis function (RBF) TABLE IV

same CWRU dataset, the same data preprocessing methods in

You might also like