Abrupt Fault Remaining Useful Life Estimation Using Measurements From A Reciprocating Compressor Valve Failure

Mechanical Systems and Signal Processing 121 (2019) 359–372
Contents lists available at ScienceDirect
Mechanical Systems and Signal Processing

journal homepage: www.elsevier.com/locate/ymssp
Abrupt fault remaining useful life estimation using

measurements from a reciprocating compressor valve failure
Panagiotis Loukopoulos a,⇑, George Zolkiewski b, Ian Bennett b, Suresh Sampath a,
Pericles Pilidis a, X. Li c, David Mba c
a
School of Aerospace, Transport and Manufacturing, Cranfield University, Cranfield, United Kingdom
b
Shell Global Solutions, Royal Dutch Shell, Rijswijk, Netherlands
c
Faculty of Technology, De Montfort University, Leicester, United Kingdom
a r t i c l e i n f o a b s t r a c t
Article history: One of the major targets in industry is minimisation of downtime and cost, and maximisa-
Received 31 December 2016 tion of availability and safety, with maintenance considered a key aspect in achieving this
Received in revised form 14 August 2018 objective. The concept of Condition Based Maintenance and Prognostics and Health
Accepted 25 September 2018
Management (CBM/PHM), which is founded on the principles of diagnostics, and prognos-
tics, is a step towards this direction as it offers a proactive means for scheduling mainte-
nance. Reciprocating compressors are vital components in oil and gas industry, though
Keywords:
their maintenance cost is known to be relatively high. Compressor valves are the weakest
Reciprocating compressor
Valve
part, being the most frequent failing component, accounting for almost half maintenance
Prognostics cost. To date, there has been limited information on estimating Remaining Useful Life
Remaining useful life (RUL) of reciprocating compressor in the open literature. This paper compares the prognos-
Multiple linear regression tic performance of several methods (multiple linear regression, polynomial regression,
Polynomial regression Self-Organising Map (SOM), K-Nearest Neighbours Regression (KNNR)), in relation to their
Self-organising map accuracy and precision, using actual valve failure data captured from an operating indus-
K-nearest neighbours trial compressor. The SOM technique is employed for the first time as a standalone tool
Instantaneous failure
for RUL estimation. Furthermore, two variations on estimating RUL based on SOM and
Principal components analysis
KNNR respectively are proposed. Finally, an ensemble method by combining the output
Statistical process control
of all aforementioned algorithms is proposed and tested. Principal components analysis
and statistical process control were implemented to create T 2 and Q metrics, which were
proposed to be used as health indicators reflecting degradation processes and were
employed for direct RUL estimation for the first time. It was shown that even when RUL
is relatively short due to instantaneous nature of failure mode, it is feasible to perform good
RUL estimates using the proposed techniques.
Ó 2018 Elsevier Ltd. All rights reserved.
1. Introduction
Reciprocating compressors are of the most essential components in oil and gas industry, being a key element in refining
sector as one of most commonly used type of equipment, requiring high reliability and availability [1,2]. They are widely
⇑ Corresponding author.
E-mail addresses: [email protected] (P. Loukopoulos), [email protected] (I. Bennett), [email protected] (S. Sampath),
[email protected] (P. Pilidis), [email protected] (X. Li), [email protected] (D. Mba).
https://doi.org/10.1016/j.ymssp.2018.09.033
0888-3270/Ó 2018 Elsevier Ltd. All rights reserved.
360 P. Loukopoulos et al. / Mechanical Systems and Signal Processing 121 (2019) 359–372
used, being powerful, flexible, efficient and dependable in many compression applications. Despite their popularity, their
maintenance cost can be several times higher than that of other compressor types [3], since the number of moving parts
is greater [4], thus they are expected to experience more failures. Bloch and Heinz [1] note that valves are the most common
failing part (36%), making them the weakest component, accounting for half the maintenance cost [4].
Valves are an essential part of the reciprocating compressor as they have a significant impact on its performance from
both efficiency and reliability perspectives [2]. Their smooth operation is integral since they regulate the gas flow for com-
pression. Valves suffer numerous hardships during their operation as they may come in contact with liquids, foreign particles
or debris, corrosive gases or materials depending on application [2]. Furthermore, pulsations, tension, compression and
impact created either by the compressor or the valve motion itself can affect proper valve function [2].
In order to decrease downtime and cost, while increasing availability and safety, efficient maintenance is essential [1,2]
since reciprocating failures can cause from production loss to human casualties [3,5]. Condition Based Maintenance (CBM)
[6–10] is a method founded on the diagnostics principle and has been increasingly popular over the years, advocating that
maintenance should be made only when actually needed depending on unit’s health state; it is an effective tool that moves
towards this direction [1], with diagnostics being an established area for valve failures [3,11–15]. The equipment of interest
is mounted with sensors collecting Condition Monitoring (CM) measurements which are analysed for diagnostics purposes –
determine whether healthy or a faulty, and in case of fault identify failure mode – and suggest actions to be taken
accordingly.
An extension of CBM is Prognostics and Health Management (PHM) [6–9,16–19] which has been gaining traction during
recent years and is founded on prognostics principle [6–10,16–20]. It predicts the time to failure, known as Remaining Useful
Life (RUL), after a fault has occurred, enabling the user to schedule maintenance in advance. PHM’s proactive nature can
assist optimising maintenance by avoiding any unnecessary action. Since PHM can be employed after a fault has been
detected, diagnostics is required and thus its coupling with CBM would be unavoidable, leading to CBM/PHM [6]. To the
authors’ knowledge, there is limited information about prognostics on reciprocating compressors in open literature. Conse-
quently, the purpose of this project is comparison of several prognostics methods in order to identify most suitable ones
based on accuracy and variability.
Prognostics techniques can be divided into two groups [6,7,9,16–21]:
i. Data-driven. They model the degradation process using historical information, and are suitable when there is limited
physical understanding of system under study. They struggle in cases for which they have not been trained like novel
events, while their accuracy depends on amount and quality of available data.
ii. Physics based. They create a mathematical representation of system’s or failure’s physical aspect. They are computa-
tionally expensive and tend to be application specific though they can outperform data-driven.
Similarly, there are two ways for calculating RUL [7,21,22]:
i. Direct estimation. Relationship between information and RUL is modelled. It requires knowledge of historical and cur-
rent information, with data being the input and RUL being the output. It is useful in cases lacking failure threshold.
ii. Indirect estimation. Relationship between information and a Health Indicator (HI), reflecting machine’s health status,
is modelled. In some cases HI can be modelled as function of time. HI is extrapolated until a failure threshold is
reached. RUL is estimated as difference between current and failure time. It requires knowledge of historical, current,
and future information.
This project focused on data-driven prognostics and direct RUL estimation due to availability of CM measurements
accompanied by historical failures. The techniques employed were:
i. Multiple Linear Regression (MLR) and Polynomial Regression (PR) which belong to trend extrapolation, one of the sim-
plest methods and most commonly used one in industry [17,23–25].
ii. Self-Organising Map (SOM) which belongs to Neural Networks (NN) family, one of the most favoured methods in aca-
demia [17]. To the authors’ knowledge, SOM has yet to be applied for prognostics as a standalone technique. Also, a
RUL estimation variation based on SOM was proposed.
iii. K-Nearest Neighbours Regression (KNNR) which belongs to similarity-based prognostics, an emerging trend with
great potential [26,27]. Moreover, a RUL estimation variation based on KNNR was proposed.
iv. An ensemble method averaging each of the aforementioned algorithms’ output was proposed.
These methods were applied to non-uniformly sampled historical valve failure data from an industrial reciprocating com-
pressor, retrieved from a server rather than raw sensor measurements commonly used. Use of actual information addressed
a major prognostics challenge: limited works utilising real-life data [7,16–21], demonstrating PHM’s applicability and ben-
efits in industry, and its implementation to failure modes that are instantaneous in contrast to slowly time varying ones usu-
ally examined.
P. Loukopoulos et al. / Mechanical Systems and Signal Processing 121 (2019) 359–372 361
Principal Components Analysis (PCA) and Statistical Process Control (SPC) were employed to create Hotelling T 2 and Q
residuals metrics which, to the authors’ knowledge, are used for the first time to reflect degradation process of compressor
and employed for RUL estimation. PCA/SPC has found limited application in reciprocating compressors as diagnostics tool.
Ahmed et al. [28] used experimental raw sensor data, extracted features, fused them with PCA and performed detection
of various faults via SPC. They further enhanced their methodology in [29] by extracting more features and utilising contri-
bution plot of Q metric to identify features associated with faults that can assist identification. Prognostics algorithms were
benchmarked while utilising these metrics.
The rest of the paper is organised as follows. Section 2 reviews literature of prognostics methods employed. Section 3
analyses HI creation process and overviews prognostics methods. Section 4 describes data acquisition procedure and eval-
uation metrics used. Section 5 presents results followed by a discussion. Section 6 contains concluding remarks.
2. Prognostics methods literature review
Trend extrapolation is one of the most preferred prognostics method in industry, being the simplest one, though there are
limited published works in literature [17]. Zhao et al. [30] used S-transform, Gaussian pyramid, local binary pattern, PCA and
linear discriminant analysis for pre-processing along with MLR for RUL estimation for bearings. Li and Nilkitsaranont [31]
employed MLR for prognostics of gas turbine engine during early degradation stage while quadratic regression was used
when degradation deteriorated. Alamaniotis et al. [32] applied fuzzy sets and MLR for prognostics of power plant turbine
blade. Proposed methodology was superior to simple MLR. MLR has also been used extensively as a benchmarking tool, along
with PR. In such works, MLR/PR were used either to compare performance of proposed methodology, usually found inferior
[33–35], or to compare performance of several algorithms [36,37]. These works used either experimental [30,33,34,36,37] or
simulated [31,35] or actual [32] raw sensor data.
SOM has yet to be applied for RUL estimation, though it has been used for data fusion creating a HI, namely the Mean
Quantisation Error (MQE), for prognostics purposes [38–42], where all works used experimental raw sensor measurements.
Inspiration of implementing SOM for direct RUL estimation was taken by its missing data imputation capabilities and
similarity based prognostics. Arima et al. [43] trained several SOMs with missing values being imputed as average of
their corresponding weights from their best matching units in each map. Fessant and Midenet [44] used SOM to detect
outliers as well as to impute missing data in a real transport survey with artificially inserted missing values. Rustum and
Adeloye [45] compared imputation performance of SOM, MLR, and backpropagation NN on water treatment time series,
with SOM being superior. Folguera et al. [46] applied SOM to impute artificially inserted missing values in water sample
dataset.
In similarity based prognostics, a reference data base is created with historical failures which are compared with an ongo-
ing case via distance analysis. Wang et al. [22], used MLR for fusion, curve fitting for smoothing, and segmented failure tra-
jectories. RUL was estimated based on similar reference RULs by measuring distance of ongoing failure trajectory section
with historical ones. Zio and Maio [24] segmented and normalised failure signals. During normal operation, RUL was esti-
mated as Mean Time to Failure (MTTF). After fault detection, RUL was calculated as weighted sum of historical RULs based
on fuzzy similarity of current segment and reference ones. They further enhanced their methodology in [47] where RUL was
calculated continuously and new estimate was compared with previous ones under assumption of stationarity. In case of no
significant change healthy state was considered and RUL was replaced by MTTF. Maio and Zio [25] compared Zio and Maio’s
technique [24] with Monte Carlo based particle filter where it was shown computationally cheaper. Mosallam et al. [23]
implemented symmetrical uncertainty method, PCA and EMD for pre-processing and segmented failure signals. RUL was
estimated as most similar historical RUL based on K-nearest neighbour analysis of ongoing segment and reference ones, with
discrete Bayesian filter used for uncertainty quantification. They also applied the same methodology in [48], and further
enhanced it in [49] by adding GPR in RUL estimation process. Zhang et al. [50] used phase space reconstruction trajectory
for pre-processing and segmented failure trajectories. RUL was estimated using weighted average of most similar historical
RULs, based on distance analysis of ongoing segment and reference ones. Wang et al. [51] applied MLR for fusion, RVM for
offline sparse training, estimated RUL as weighted average of historical RULs based on similarity analysis of ongoing trajec-
tory with reference ones, and quantified uncertainty with uncertainty propagation map. Khelif et al. [52] used MLR for fusion
and curve fitting for smoothing. RUL was estimated as weighted sum of most similar historical RULs based on distance anal-
ysis of current trajectory and reference ones, with most similar cases being favoured and dissimilar ones being penalised. Li
et al. [27] used wavelet packet analysis for pre-processing and applied Zio and Maio’s methodology [24] where they com-
pared two membership functions which displayed similar performance. You and Meng [26] segmented historical failures.
RUL of current segment was estimated based on weighted RUL of similar historical ones. During similarity analysis, more
recent measurements within segment had greater importance. Xue et al. [53] estimated RUL by applying local regression
on most similar historical RULs based on fuzzy instance modelling of ongoing failure and reference ones, optimised using
evolutionary analysis. Lam et al. [54] applied empirical signal to noise ratio method for pre-processing, PCA for fusion,
and kernel regression for smoothing. Similarity of ongoing failure with historical ones was computed using various metrics,
while RUL was estimated in several ways according to similarity results. Point estimated RUL via Pearson correlation sim-
ilarity metric outperformed the rest. These works used either experimental [23,25–27,50,51,53] or simulated [22,24,47–
52,54] raw sensor data. Similarity based prognostics has been implemented on turbofan engines [22,48,49,51–54], fission
reactor [24,47], crack propagation [25], lithium-ion batteries [23,48], bearings [50], contact resistances of electromagnetic
relays [27], and ball grid array solder joints of printed circuit boards [26].
Despite its simplicity, KNNR has found limited applications regarding prognostics. Rezgui et al. [55] combined support
vector regression with KNNR for diagnostics and prognostics of reverse polarity fault. Hu et al. [56] extracted features
and used KNNR, optimised by particle swarm optimisation and k-fold cross validation, for RUL estimation of lithium-ion bat-
tery. Zhao et al. [57] extracted features, and used KNNR with Dempster-Shafer belief theory for RUL estimation local oscil-
lator from an analogue circuit of a high frequency receiver. The method outperformed NN, fuzzy NN, and particle filtering.
These works used either experimental [56] or simulated [55,57] data. On the other hand, KNNR has found popularity in other
fields like forestry [58–60] or traffic forecasting [61–63].
3. Prognostics methods overview
3.1. Health indicator creation
In data-driven prognostics, data quality is of paramount importance, affecting RUL estimation accuracy [6,9]. Hence, it is
essential data used reflect degradation process adequately. This can be achieved via HIs that can be either features extracted
from signals (mean, skewness, kurtosis, etc.), or one-dimensional metrics created by data fusion requiring all useful infor-
mation be considered [6,9]. In this work, PCA with SPC were implemented to construct Hotelling T 2 and Q residuals metrics
describing compressor’s valve degradation, used for the first time as HIs and RUL estimation inputs.
3.1.1. Principal components analysis (PCA)

PCA is a dimensionality reduction technique that projects a number of correlated variables in a lower space via a linear
transformation, while preserving maximum possible variance within original set, creating a new group of uncorrelated, and
orthogonal latent variables [64]. Let X be a n p data matrix (n: number of measurements, p: number of variables), its PCA
transformation is [64]:
X ¼ P0 T þ R; ð1Þ
where T, the n k score matrix, is the projection of X from p-dimensional space to k-dimensional, with k 6 p. P, the p k
component matrix, is the linear mapping of X to T. R is the n p reconstruction error matrix. Calculation of principal com-
ponents can be done with use of singular value decomposition [64].
Selection of appropriate k was done employing Cumulative Percentage of Variance (CPV) [64], where k first components
leading to a model capturing a predefined variance percentage are kept. A typical value is 90% [64].
3.1.2. Statistical process control (SPC)

SPC is used to monitor a process for diagnostics purposes. A univariate process is considered to be healthy when its value
lies within some statistical limits decided by control chart used [65]. For multivariate process, SPC assumptions of variable
independency are inadequate. Hence, Multivariate Statistical Process Control (MSPC) is introduced, where a single control
chart is created using information from all variables. A common tool used to facilitate MSPC is PCA by reducing number
of monitored variables and decorrelating them. Some good reviews describing application of PCA and MSPC can be found
in [66–69].
After PCA model has been created, its scores and residuals can be used for SPC. Control charts employed in this work are
Hotelling T 2 and Q residuals, most widely used ones regarding PCA/SPC [66–69]. Hotelling metric for score matrix T is
[66,69,70]:
Xk
ti
T2 ¼ 2
; ð2Þ
s
i¼1 i
With ti ith principal component scores, s2i its variance, and control limit [66–69]:

k n2 1
T 2a ¼ F a ðk; n kÞ; ð3Þ
nðn kÞ
With F a ðk; n kÞ the ð100 1Þa% upper critical point of F distribution with k and n k numbers of freedom.
Q metric for residual matrix R is [66,67,69]:
X
n
ðxi ^xi Þ ;
2
Q¼ ð4Þ
i¼1
With ^
xi reconstructed values of xi , and control limit [66,67]:
Q a ¼ gx2h;a ; ð5Þ
v arðRÞ
, h ¼ 2ðvv ar
2
ar ðRÞÞ
where g ¼ 2mean ðRÞ ðRÞ
, and x2h;a the ð100 1Þa% upper critical point of x2 distribution with h numbers of freedom.
Metrics created by PCA/SPC were used as HIs for prognostics purpose. Procedure of employing PCA/SPC to create HIs is
described in a compact form as follows. In phase I healthy data are centred and scaled to unit variance, and PCA model is
created, along with control limits for T 2 and Q . In phase II new data are centred and scaled using healthy means and vari-
ances, projected on healthy PCA model calculating their scores and residuals, and their metrics are estimated creating HIs.
3.2. Prognostics methods
As already mentioned, there is lack of literature about prognostics on reciprocating compressors. Ergo, several prognostics
methods were compared on valve failure data from an operation industrial compressor.
3.2.1. Multiple linear regression (MLR)

MLR belongs to trend extrapolation family being its simplest representation. Let Y be a n 1 response vector and X a n p
regressor matrix. MLR is used to predict the dependent variable as linear combination of independent ones [71]:
yi ¼ b0 þ b1 xi1 þ . . . þ bp xip þ ei ; ð6Þ
With b0 ; b1 ; . . . ; bp regression coefficients to be estimated, e the residuals assumed to be uncorrelated and normally dis-
tributed, and i ¼ 1; . . . ; n. Parameters are calculated utilising least squares algorithm [71]:

^ ¼ X 0 X 1 X 0 Y;
b ð7Þ
Fit of model on data can be assessed using coefficient of determination R2 that measures amount of variability captured
[71]:
SSE
R2 ¼ 1 ; ð8Þ
SST
P P 2
with SSE ¼ E2 [71], and SST ¼ Y Y [71]. Another criterion is the adjusted coefficient of determination R2adjusted [71]:
SSE =ðn pÞ
R2adjusted ¼ 1 ; ð9Þ
SST =ðn 1Þ
Both metrics range from zero indicating bad fit to one indicating perfect fit.
MLR was trained using historical failures and applied for direct RUL estimation, with HIs being independent variables and
RUL dependent one.
3.2.2. Polynomial regression (PR)

PR also belongs to trend extrapolation class. It can be seen as an extension to MLR where predictors are also included in
power form. Polynomial order depends on desired power. A second order polynomial for two regressors is [71]:
yi ¼ b0 þ b1 xi1 þ b11 x2i1 þ b2 xi2 þ b22 x2i2 þ b12 xi1 xi2 þ e; ð10Þ
Estimation of coefficients is done as in Section 3.2.1.
Depending on polynomial order, number of parameters can be significantly large leading to overfitting. Stepwise regres-
sion is most widely used selection process for including an optimum number of regressors [71]. It is an iterative procedure
where terms are included or removed from the model based on a partial F-test. Considering that f in is F-value for including a
term and f out for removing one, for a variable to be included it should be f P f in and to be excluded f 6 f out [71]. During the
initial step, a model is constructed using only most correlated regressor with the dependent variable as it will have highest f
value. The process concludes when no variables can be included or excluded [71], leading to polynomial stepwise regression
(PSR). Adequacy of model can be examined using R2 and R2adjusted metrics. Prognostics application of this method is the same
as for MLR.
3.2.3. Self-organising map (SOM)

SOM is a form of NN used for unsupervised learning, introduced by Kohonen [72]. It is employed for clustering, and
dimensionality reduction, used to project multidimensional data on a two-dimensional structure resembling a map
[38,41,42,44–46,72–79]. SOM consists of multidimensional input and competitive output. Let X be a n d data matrix.
The output represents a grid of M neurons, each with a weight vector W i ¼ ½wi1 ; . . . ; wid , interconnected via a neighbourhood
pffiffiffi
relation. M can be determined as [42,45,72,75]: M ¼ 5 n. Dimensions d1 and d2 can be found using two largest eigenvalues
pffiffiffiffiffiffiffiffiffiffiffiffi
of covariance matrix [42,45,72,75]: d1 =d2 ¼ e1 =e2 .
Training is an iterative process. X is centred and scaled to unit variance, and weight vectors are initialised given random
values limited within subspace of e1 and e2 [42,72,75]. A random sample is presented to the map, and its similarity to every
neuron is calculated to identify the Best Matching Unit (BMU) [38,41,42,44–46,72–79]. A common similarity metric is Eucli-
dean distance [38,41,42,44–46,72–79]:
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u d
uX 2
Dki ¼ jjX k W i jjt xkj wij ; ð11Þ
j¼1
With k ¼ 1; . . . ; n, i ¼ 1; . . . ; M, j ¼ 1; . . . ; d, X k the kth sample, W i weight vector of ith neuron, and Dki their distance. BMU’s
and its neighbours’ weight vectors are adjusted to better resemble input sample [38,41,42,44–46,72–79]:
W i ðt þ 1Þ ¼ W i ðt Þ þ aðtÞhBMUi ðt ÞDki ðt Þ; ð12Þ
With t current time step, hBMUi ðtÞ neighbourhood function centred at BMU, and aðt Þ learning rate. A typical neighbourhood
function is Gaussian [75]: hBMUi ðtÞ ¼ edBMUi =2rðtÞ with rðt Þ neighbourhood radius and dBMUi Euclidean distance between BMU
2 2
and neuron i on the map lay out. Learning rate is [75]: aðt Þ ¼ a0 ð0; 005=a0 Þt=T , with a0 initial rate and T training length. Both
hBMUi ðt Þ and aðtÞ are monotonically decreasing functions as iterations increase.
Fit of map on data can be evaluated using Mean Quantisation Error (MQE) [42,45,75]:
1X n
qe ¼ jjX i W BMUi jj; ð13Þ
n i¼1
Which is average of Euclidean distance of all input data and their BMUs. Its range is [0, 1) with zero indicating perfect fit.
Another metric used is Topographic Error (TE) [42,45,75]:
1X n
te ¼ uðX i Þ; ð14Þ
n i¼1
With u a binary variable yielding one if first and second BMUs of X i are not bordering and zero if they are. Its range is [0, n]
with zero indicating perfect fit.
After SOM construction, each neuron is able to recognise inputs that are similar to itself, earning the name of self-
organising map. Data with similar patterns are associated with same neurons or their neighbours, preserving topology
and relations of measurements.
Although SOM is a form of NN, one of the most favoured prognostics methods in academia [17], it has yet to be used for
RUL estimation. Inspiration of utilising SOM in such way was drawn from its imputation capability along with similarity
based prognostics. When used for imputation, SOM is constructed using observed measurements. The sample containing
missing data is presented to the map and its BMU is determined using distances of its observed values and their correspond-
ing neuron weights. Missing values are imputed as their equivalent BMU weight values. Similarity based prognostics [26,27]
is an emerging trend over recent years. For an ongoing fault, RUL is estimated as sum of historical RULs, weighted based on
similarity analysis between current information and historical failures [22,23,26,27,47–49,51,52,80–82]. It is noted that as
time passes, estimated RUL converges with actual one [26,47]. Its requirements are [22,23,26,27,48,52,81]: sufficient amount
of historical failures, continuous monitoring of information, and information reflecting system degradation through time. It
is a simple method as there is no complex algorithm used, making it generic, though is highly affected by data quality
[26,27,49,80,82].
Based on above, an offline SOM is constructed using historical T 2 and Q measurements and their corresponding RUL val-
ues. In an online step, new T 2 and Q statistics are presented to the map, and their similarity to every neuron is calculated to
identify the BMU. Then RUL is calculated as its equivalent BMU weight values. As with similarity based prognostics, the pur-
pose is to identify similar degradation patterns with historical cases utilising SOM’s structure. RUL estimation in this study
was carried out by performing pointwise similarity analysis on the T 2 and Q statistics rather than similarity analysis between
segments as usually done (Section 2). In pointwise similarity analysis, only the latest information was compared with his-
torical samples, since more recent samples contain richer information about degradation [26,27,52]. Moreover, RUL is cal-
culated solely based on most similar case. This method shall be denoted as SOM 1.
3.2.4. Proposed variation of SOM based RUL estimation

A variation of RUL estimation process based on SOM is also proposed. Instead of creating a single map from all historical
failures, an individual map is trained for each case. For an ongoing fault, its information is presented to each SOM and RUL is
calculated as average of imputation result of all maps. This variation shall be denoted as SOM 2.
3.2.5. K-nearest neighbours regression (KNNR)

KNNR is a form of similarity based prognostics, belonging in nonparametric regression family. It estimates the regression
function without making any assumptions about underlying relationship of dependent and independent variables
[59,62,83,84] by utilising similarities of current sample to historical points for prediction [63]. KKNR is a distribution free,
multivariate method that preserves variable relations and local structure within data, easy to use, fast and computationally
cheap [85], but highly affected by amount of historical data available [56].
Let X be a n q regressor matrix, Y its n 1 response vector and u a new sample. Resemblance of new sample’s predictors
and historical ones is calculated via similarity analysis. Euclidean distance [55,58,61–63,85–88] is most commonly used sim-
ilarity metric [56,59,85]:
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
uX
u q 2
dðu; xi Þ ¼ u xi ¼ t uj xij ; ð15Þ
j¼1
with i ¼ 1; . . . ; n. u’s response value is [56,59,85]:

PK
wl yl
yu ¼ Pl¼1
K
; ð16Þ
l¼1 wl
With K number of most similar historical points to current sample according to dðu; xi Þ, wl and yl weight and response
value of lth neighbour. Hence, response value is weighted sum of response values of K closest historical samples based on
their predictor similarities. About weighting there is no straightforward formula and can be done in various ways [84]. For-
mulation used here was:
X
K
wl ¼ 1 dl = dl ð17Þ
l¼1
Optimum K can be found via k-fold cross validation [58,83–87]. Historical data are partitioned into k new sets of approx-
imately equal length. For a range of Ks, a model is trained with k-1 sets, leaving one out for validation estimating an error
criterion. This is repeated until all subsets are left out once creating k new models. Mean error for each K is calculated and
smallest one yields optimum K [56,83,89].
As with SOM, pointwise similarity was used instead of segmented. Furthermore, RUL was estimated using K most similar
samples from all historical data, meaning that one failure might have more than one common points with current sample
while another might have none. This method shall be denoted as KNNR 1.
3.2.6. Proposed variation of KNNR based RUL estimation

A variation of RUL estimation process based on KNNR is also proposed. As with SOM 2, instead of applying KNNR on all
historical data, it is implemented on each historical case. RUL is weighted sum of RULs from each case based on similarity
results. In this variation, instead of using K most similar points from each case only most similar one was used. This variation
shall be denoted as KNNR 2.
3.2.7. Ensemble method

Output of each prognostics algorithm was also combined via averaging leading to an ensemble method:
Ensemble ¼ ðMSR þ PR þ SOM1 þ SOM2 þ KNNR1 þ KNNR2Þ=6; ð18Þ

The purpose is to improve prognostics results by combining strengths of multiple techniques, refining their results.
4. Data acquisition
4.1. Data preparation
Information employed in this work came from an operational industrial two-stage, four-cylinder, double-acting recipro-
cating compressor that has been used in various applications (compressing different gasses). The machine is instrumented
with sensors collecting both process (temperature, pressure, speed, etc.) and mechanically (bearing vibration, bearing tem-
perature, seal pressure etc.) related measurements, that stream continuously, via internet, to a central location. They are
stored, pre-processed, and analysed for CBM purposes. Considering each sensor’s sampling frequency, a large volume of data
is created every second thus a huge amount of storage is required. To mitigate this issue, a rule set was created deciding
which values should be stored, creating non-uniformly sampled sets. Linear interpolation was utilized in the data retrieval
tool kit to resample non-uniformly sampled data. The fault mode under study was a valve failure. A ring valve was the defec-
tive component with cause of failure: broken valve plate leading to leakage. There were 13 defective cases available that all
took place in the same cylinder within a period of one and a half years. Depending on case, the failing valve was either Head
End (HE) or Crank End (CE) discharge valve. In all failures, valves were of same type, model, and manufacturer. Failure was
denoted as the point when it was deemed as incapable of performing its intended function.
Historical information of 16 temperature measurements, one for each valve (two suction (HE/CE) and two discharge (HE/
CE) per cylinder, four cylinders), was extracted from a server with sampling period T s ¼ 1s (f s ¼ 1Hz). Each case contained
roughly two and a half days’ worth of data, consisting of both healthy and failing states. Table 1 summaries fault duration of
each case (moment of detection until moment of failure). The instantaneous nature is evident as failure occurs in a matter of
Table 1
Data set specifications.
Failure Case 1 2 3 4 5 6 7 8 9 10 11 12 13
Fault Duration (s) 333 119 280 245 125 242 114 233 494 131 246 73 254
minutes. Prior to proceeding with analysis, data were scanned for missing values, a common phenomenon in industry, util-
ising SOM for imputation.
In order to mitigate the impact of external factors such as air temperature or rotational speed on the temperature mea-
surements, their ratios were employed for the calculation of HIs. Temperature ratios were calculated for suction and dis-
charge of each cylinder, as follows T r ¼ T HE =T CE . Healthy data from each case were centred and scaled to unit variance,
and used to create a PCA model of 3 components (CPV ¼ 95%) while calculating T 2 and Q control limits. Failure data, after
centring and scaling, were projected on the model calculating their T 2 and Q metrics creating HIs (Fig. 1). Both metrics were
divided with their respective statistical limits in order to be comparable.
An appropriate HI, needs to be monotonic and encapsulate degradation evolution through time [21,23,30,34,48,49]. If this
is satisfied, estimated RUL is expected to be accurate [82]. Furthermore, it is desired that HI is of low variability [30,34], pre-
sent roughly same value during failure under same failure mode and operating conditions, and have resembling pattern
[21,30]. Fig. 1 confirms suitability of both metrics by fulfilling aforementioned perquisites, adequately reflecting fault
propagation.
4.2. Prognostics metrics
In order to quantitatively benchmark performance of methods several criteria were used, as there is no universal criterion
available yet [36]. The metrics can be separated into two categories: a) accuracy (NMSE, MAPER, CRA) measuring distance
between estimated and actual RUL with higher accuracy desired, and b) precision (MAD) measuring error variability with
d ðt Þ be estimated
low volatility desired. Let RULðt Þ be actual RUL at time t, t ¼ 1; . . . ; N number of available samples, RUL
d ðtÞ be difference of actual and estimated RUL. Employed metrics are:
RUL, and DRULðtÞ ¼ RULðtÞ RUL
i. Normalised mean square error (NMSE) [90]:
1 X
2
N
DRULðiÞ
NMSE ¼ 1 2 ; ð19Þ
N i¼1
RULðiÞ RUL

With RUL the mean value of RUL.
ii. Mean absolute percentage error (MAPER) [36]:
Fig. 1. T 2 and Q health indicators.

N
1 X 100DRULðiÞ;

MAPER ¼ ð20Þ
N i¼1 RULðiÞ
iii. Cumulative relative accuracy (CRA) [36]:
1 XN
CRA ¼ RAðiÞ; ð21Þ
N i¼1
With RAðiÞ the Relative Accuracy at each time instance [36,91]:

jDRULðiÞj
RAðiÞ ¼ 1 ; ð22Þ
RULðiÞ
iv. Mean absolute deviation [36]:
1X N
MAD ¼ jDRULðiÞ medianðDRULðiÞÞj; ð23Þ
N i¼1
NMSE and CRA range in (1, 1] with 1 indicating perfect score, while MAPER and MAD range in [0, 1) with 0 indicating
perfect score.
5. Prognostics results
5.1. Application of prognostics methods
RUL was estimated directly with T 2 and Q being independent variables and RUL dependent one. Prognostics methods
attempted to model this relationship so that RUL could be calculated accurately. RUL was logarithmically transformed to
improve the fit of the models described in Section 3. During training, 12 cases were used for model building while the
13th was kept for testing. Results for representative cases 8 and 11 are presented. Training outcome of each method can
be found below. All methods were implemented in Matlab [92–94].
5.1.1. Multiple linear/polynomial regression

For PR, third order was maximum order examined. Table 2 contains R2 and R2adjusted metrics. Both methods have an ade-
quate fit with PR being superior having greater values.
5.1.2. Self-organising map

Data were centred and scaled to unity. Maps were constructed using Gaussian neighbourhood function with starting
radius r ¼ maxðd1 ; d2 Þ=4, an initial learning rate a0 ¼ 0:5, and Euclidean distance. Table 3 and Table 4 contain MQE and
TE metrics for SOM 1 and SOM 2. Both methods yield high accuracy having low metric values.
5.1.3. K-nearest neighbours regression

Data were centred and scaled to unity, and Euclidean distance was used. Table 5 contains optimum K for KNNR 1, selected
via 10-fold cross validation ranging from 1 to 200, while for KNNR 2 optimum K was decided a priori as K = 1.
5.2. Results
Fig. 2 and Fig. 3 contain prognostics results for both historical failures, giving a qualitative perspective of each method’s
performance. X-axis indicates time while y-axis RUL at each time stamp, with t ¼ 0 the moment fault was detected
(RUL = 233 case 8 and 246 case 11, Table 1) and t ¼ 233 , or t ¼ 246, the moment of failure (RUL = 0). Graphs consist of a
number of lines. Black indicates actual RUL through time, as observed in-situ, and rest correspond to each algorithm’s esti-
mations. All methods perform comparably well with best performing being the ensemble technique (magenta line) as it
tracks closely RUL evolution in both cases, followed by polynomial regression (continuous blue line), while worst performing
Table 2
MLR/PR R2 and R2adjusted metrics.
Algorithm MLR PR
Failure case 8 11 8 11
R2 0,68 0,67 0,76 0.77

R2adjusted 0,67 0,67 0,76 0,77
Table 3
Mean quantisation and topological
errors for SOM 1.
Failure case 8 11
MQE 0,13 0,12
TE 0,02 0,04
Table 4
Mean quantisation and topological errors for SOM 2.
Failure case 1 2 3 4 5 6 7 8 9 10 11 12 13
MQE 0.04 0,09 0,05 0,05 0,06 0,06 0,06 0,04 0,04 0,06 0,05 0,10 0,06
TE 0.31 0,27 0,40 0,29 0,22 0,43 0,16 0,20 0,27 0,21 0,40 0,18 0,19
Table 5
Optimum K for KNNR 1.
Failure case 8 11
k 15 14
Fig. 2. RUL estimation for failure case 8.
seems to be SOM 1 (dashed red line) which demonstrates great variation. KNNR 1 (dashed green line) performs adequately,
while KKNR 2 (continuous green line) and SOM 2 (continuous red line) consistently underestimate RUL. It can be noted that
all methods converge to actual RUL as time passes.
Quantitative inspection of methods’ performance can be done via metrics found in Table 6. The prognostics horizon for all
metrics is from moment of fault detection until failure, meaning all available samples were considered in calculation. From
results it is evident that ensemble method consistently outperforms the rest being superior in most metrics for both failures,
while in case where another technique prevails, ensemble follows closely. Although PR performs well in case 8, in 11 it is
outperformed by others. KNNR 2 displays, lowest variability, an attribute highly desired, followed closely by ensemble
one. SOM 1 has lowest accuracy and highest volatility. Overall, quantitative results are in accordance with qualitative ones.
Furthermore, results confirmed the claim of lack of universal metrics since the same method might be suitable or not
depending on metric used. This calls for more effort to be put towards this direction.
Based on prognostics results presented in this section, there are some comments that can be made:
PSR and MLR performed similarly well with PSR being superior based on both qualitative and quantitative results, as it
could better reflect the complex relationship between RUL and HIs by including interaction and higher order terms of HIs,
overcoming MLR’s rigidness.
Fig. 3. RUL estimation for failure case 11.
Table 6
Evaluation metrics.
Performance metrics NMSE MAPER CRA MAD

Failure case 8 11 8 11 8 11 8 11
Method
MLR 0.77 0.86 30.50 29.39 0.70 0.71 26.10 16.90
PR 0.96 0.66 14.17 40.14 0.86 0.60 11.3 35.61
SOM 1 0.35 -0.39 40.45 51.88 0.60 0.48 41.85 62.05
SOM 2 0.73 0.09 31.43 61.21 0.69 0.39 14.73 26.41
KNNR 1 0.90 0.71 16.94 35.53 0.83 0.65 16.43 33.53
KNNR 2 0.83 0.46 27.96 49.13 0.72 0.51 9.33 14.36
ENSEMBLE 0.96 0.92 16.14 25.17 0.84 0.75 9.35 15.21
SOM and KNNR displayed similar performance, both belonging to similarity based prognostics family and using the same
distance metric (Euclidean) for similarity analysis.
SOM 1 performed poorly from both accuracy and variability perspective due to considering only most similar case for
estimation lacking versatility. SOM 2, KNNR 1, and KNNR 2 performed better since they considered more information dur-
ing RUL calculation.
The difference between SOM 2 and both KNNR versions can be attributed to the pooling procedure where SOM 2 averaged
RULs while the rest weighted them being more flexible.
KNNR 1 tended to outperform KNNR 2 indicating that even when considering more than one similar case from the same
failure during RUL estimation can increase accuracy. On the other hand, KNNR 2 displayed lower variation hinting that
considering each case separately can reduce volatility.
Ensemble method’s performance is highly dependable on individual performance of compromising methods. Its compo-
nents performed well thus it displayed the best overall performance based on both qualitative and quantitative results. Its
output could be seen as refinement of prognostics estimations of its elements.
Importance of HI quality should be noted, as performance of algorithms is also heavily dependent on quality of HIs used
since they reflect degradation process. The HIs that were used (T 2 and Q ) encapsulated adequately failure evolution con-
firmed by good results, tracking closely fault propagation through time.
6. Conclusions
In this project, four prognostics techniques (MLR, PR, SOM 1, and KNNR 1), along with two RUL estimation variations
(SOM 2 and KNNR 2), and an ensemble method combining aforementioned algorithms’ output, were benchmarked using
valve failure data from an operational industrial reciprocating compressor. To the authors’ knowledge this was the first
attempt of RUL estimation on reciprocating compressor valves. Furthermore, use of actual data addressed lack of works
regarding implementation of prognostics in industrial applications demonstrating PHM’s potency. Moreover, it was the first
known implementation of SOM in RUL estimation as standalone prognostics method, and the first time that T 2 and Q metrics
were used as HIs and utilised in direct RUL estimation process.
Analysis showed that all methods performed comparably well both in qualitative (graphs) and quantitative (metrics)
analysis, with ensemble outperforming the rest by better tracking RUL evolution and having high metric values. SOM 1 per-
formed poorly being less accurate and highly volatile considering only most similar case, while SOM 2, KNNR 1, and KNNR 2
performed closely being all similarity based methods using the same distance metric, with KNNR 1 performing the best. Also,
quality of HIs used was deemed satisfactory given good results of techniques, confirming suitability of T 2 and Q metrics to be
used as such. Moreover, results demonstrated that all methods were able to cope with instantaneous nature of failure mode
under study.
References
[1] H.P. Bloch, A Practical Guide to Compressor Technology, Second. John Wiley & Sons, Inc., Hoboken, New Jersey, 2006.
[2] H.P. Bloch, J.J. Hoefner, Reciprocating Compressors: Operation & Maintenance. Butterworth-Heinemann, 1996.
[3] V.T. Tran, F. AlThobiani, A. Ball, An approach to fault diagnosis of reciprocating compressor valves using Teager-Kaiser energy operator and deep belief
networks, Expert Syst. Appl. 41 (9) (2014) 4113–4122.
[4] W.A. Griffith, E.B. Flanagan, Online continuous monitoring of mechanical condition and performance for critical reciprocating compressors, in: 30th
Turbomachinery Symposium, Texas: Houston, 2001.
[5] Keerqinhu, G. Qi, W.-T. Tsai, Y. Hong, W. Wang, G. Hou, Z. Zhu, Fault-diagnosis for reciprocating compressors using big data, in: 2016 IEEE Second Int.
Conf. Big Data Comput. Serv. Appl., pp. 72–81, 2016.
[6] G. Vachtsevanos, F. Lewis, M. Roemer, A. Hess, B. Wu, Intelligent Fault Diagnosis and Prognosis for Engineering Systems, John Wiley & Sons Inc.,
Hoboken, NJ, USA, 2006.
[7] A.K.S. Jardine, D. Lin, D. Banjevic, A review on machinery diagnostics and prognostics implementing condition-based maintenance, Mech. Syst. Signal
Process. 20 (7) (Oct. 2006) 1483–1510.
[8] A.J. Guillén, A. Crespo, M. Macchi, J. Gómez, On the role of Prognostics and Health Management in advanced maintenance systems, Prod. Plan. Control
7287 (June) (2016) 1–14.
[9] J. Yan, Machinery Prognostics and Prognosis Oriented Maintenance Management, John Wiley & Sons Singapore Pte. Ltd, Singapore, 2014.
[10] S. Kadry, Diagnostics and Prognostics of Engineering Systems, IGI Global, 2013.
[11] B.-S. Yang, W.-W. Hwang, D.-J. Kim, A. Chit Tan, Condition classification of small reciprocating compressor for refrigerators using artificial neural
networks and support vector machines, Mech. Syst. Signal Process. 19 (2) (2005) 371–390.
[12] C. Annicchiarico, A. Babbini, R. Capitani, P. Tozzi, Numerical and experimental testing of composite rings for reciprocating compressor valves, in: ASME
2013 Pressure Vessels and Piping Conference, 2013.
[13] F. Gu, Y. Shao, N. Hu, A. Naid, A.D. Ball, Electrical motor current signal analysis using a modified bispectrum for fault diagnosis of downstream
mechanical equipment, Mech. Syst. Signal Process. 25 (1) (Jan. 2011) 360–372.
[14] H. Cui, L. Zhang, R. Kang, X. Lan, Research on fault diagnosis for reciprocating compressor valve using information entropy and SVM method, J. Loss
Prev. Process Ind. 22 (6) (Nov. 2009) 864–867.
[15] K. Feng, Z. Jiang, W. He, B. Ma, A recognition and novelty detection approach based on Curvelet transform, nonlinear PCA and SVM with application to
indicator diagram diagnosis, Expert Syst. Appl. 38 (10) (2011) 12721–12729.
[16] J. Lee, F. Wu, W. Zhao, M. Ghaffari, L. Liao, D. Siegel, Prognostics and health management design for rotary machinery systems—reviews, methodology
and applications, Mech. Syst. Signal Process. 42 (1–2) (2014) 314–334.
[17] J.Z. Sikorska, M. Hodkiewicz, L. Ma, Prognostic modelling options for remaining useful life estimation by industry, Mech. Syst. Signal Process. 25 (5)
(2011) 1803–1836.
[18] B. Sun, S. Zeng, R. Kang, M.G. Pecht, Benefits and challenges of system prognostics, IEEE Trans. Reliab. 61 (2) (2012) 323–335.
[19] A. Heng, S. Zhang, A.C.C. Tan, J. Mathew, Rotating machinery prognostics: state of the art, challenges and opportunities, Mech. Syst. Signal Process. 23
(3) (2009) 724–739.
[20] M.S. Kan, A.C.C. Tan, J. Mathew, A review on prognostic techniques for non-stationary and non-linear rotating systems, Mech. Syst. Signal Process. 62–
63 (2015) 1–20.
[21] E. Zio, Prognostics and health management of industrial equipment, in: S. Kadry (Ed.), Diagnostics and Prognostics of Engineering Systems, IGI Global,
2013, pp. 333–356.
[22] T. Wang, Jianbo Yu, D. Siegel, J. Lee, A similarity-based prognostics approach for Remaining Useful Life estimation of engineered systems, in: 2008
International Conference on Prognostics and Health Management, 2008, pp. 1–6.
[23] A. Mosallam, K. Medjaher, N. Zerhouni, Bayesian approach for remaining useful life prediction, Chem. Eng. Trans. 33 (2013) 139–144.
[24] E. Zio, F. Di Maio, A data-driven fuzzy approach for predicting the remaining useful life in dynamic failure scenarios of a nuclear system, Reliab. Eng.
Syst. Saf. 95 (1) (Jan. 2010) 49–57.
[25] F. di Maio, E. Zio, Failure prognostics by a data-driven similarity-based approach, Int. J. Reliab. Qual. Saf. Eng. 20 (1) (2013).
[26] M.-Y. You, G. Meng, A generalized similarity measure for similarity-based residual life prediction, Proc. Inst. Mech. Eng. Part E J. Process Mech. Eng. 225
(3) (2011) 151–160.
[27] L.L. Li, D.J. Ma, Z.G. Li, Residual useful life estimation by a data-driven similarity-based approach, Qual. Reliab. Eng. Int. (2016).
[28] M. Ahmed, F. Gu, A.D. Ball, Fault detection of reciprocating compressors using a model from principles component analysis of vibrations, J. Phys. Conf.
Ser. 364 (2012).
[29] M. Ahmed, M. Baqqar, F. Gu, A.D. Ball, Fault detection and diagnosis using Principal Component Analysis of vibration data from a reciprocating
compressor, in: 2012 UKACC International Conference on Control, 2012, pp. 461–466.
[30] M. Zhao, B. Tang, Q. Tan, Bearing remaining useful life estimation based on time–frequency representation and supervised dimensionality reduction,
Measurement 86 (2016) 41–55.
[31] Y.G. Li, P. Nilkitsaranont, Gas turbine performance prognostic for condition-based maintenance, Appl. Energy 86 (10) (2009) 2152–2161.
[32] M. Alamaniotis, A. Grelle, L.H. Tsoukalas, Regression to fuzziness method for estimation of remaining useful life in power plant components, Mech.
Syst. Signal Process. 48 (1–2) (2014) 188–198.
[33] Y. Xing, E.W.M. Ma, K.L. Tsui, M. Pecht, An ensemble model for predicting the remaining useful performance of lithium-ion batteries, Microelectron.
Reliab. 53 (6) (2013) 811–820.
[34] T.H. Loutas, D. Roulias, G. Georgoulas, Remaining useful life estimation in rolling bearings utilizing data-driven probabilistic E-support vectors
regression, IEEE Trans. Reliab. 62 (4) (2013) 821–832.
[35] F. Lasheras, P. Nieto, F. de Cos Juez, R. Bayón, V. Suárez, A hybrid PCA-CART-MARS-based prognostic approach of the remaining useful life for aircraft
engines, Sensors 15 (3) (2015) 7062–7083.
[36] A. Saxena, J. Celaya, B. Saha, S. Saha, K. Goebel, Evaluating algorithm performance metrics tailored for prognostics, IEEE Aerosp. Conf. Proc. (2009).
[37] M.A.A. Wahab, M.M. Hamada, A. Mohamed, Artificial neural network and non-linear models for prediction of transformer oil residual operating time,
Electr. Power Syst. Res. 81 (1) (2011) 219–227.
[38] S. Hong, Z. Zhou, E. Zio, W. Wang, An adaptive method for health trend prediction of rotating bearings, Digit. Signal Process. A Rev. J. 35 (2014) 117–
123.
[39] C. Lu, L. Tao, H. Fan, An intelligent approach to machine component health prognostics by utilizing only truncated histories, Mech. Syst. Signal Process.
42 (1–2) (2014) 300–313.
[40] G. Niu, B.-S. Yang, Intelligent condition monitoring and prognostics system based on data-fusion strategy, Expert Syst. Appl. 37 (12) (2010) 8831–8840.
[41] J. Yu, Machine health prognostics using the Bayesian-inference-based probabilistic indication and high-order particle filtering framework, J. Sound Vib.
358 (2015) 97–110.
[42] R. Huang, L. Xi, X. Li, C. Richard Liu, H. Qiu, J. Lee, Residual life predictions for ball bearings based on self-organizing map and back propagation neural
network methods, Mech. Syst. Signal Process. 21 (1) (2007) 193–207.
[43] K. Arima, N. Okada, Y. Tsuji, K. Kiguchi, Evaluations of a multiple SOMs method for estimating missing values, in: 2014 IEEE/SICE International
Symposium on System Integration, 2014, pp. 796–801.
[44] Françoise Fessant, S. Midenet, Self-organising map for data imputation and correction in surveys, Neural Comput. Appl. 10 (4) (2002) 300–310.
[45] R. Rustum, A.J. Adeloye, Replacing outliers and missing values from activated sludge data using kohonen self-organizing map, J. Environ. Eng. 133 (9)
(2007) 909–916.
[46] L. Folguera, J. Zupan, D. Cicerone, J.F. Magallanes, Self-organizing maps for imputation of missing data in incomplete data matrices, Chemom. Intell.
Lab. Syst. 143 (2015) 146–151.
[47] E. Zio, F. Di Maio, A fuzzy similarity-based method for failure detection and recovery time estimation, Int. J. Performability Eng. 6 (5) (2010) 407–424.
[48] A. Mosallam, K. Medjaher, N. Zerhouni, Data-driven prognostic method based on Bayesian approaches for direct remaining useful life prediction, J.
Intell. Manuf. (2014).
[49] A. Mosallam, K. Medjaher, N. Zerhouni, Component based data-driven prognostics for complex systems: Methodology and applications, in: 2015 First
International Conference on Reliability Systems Engineering (ICRSE), 2015, vol. 56, pp. 1–7.
[50] Q. Zhang, P.W.-T. Tse, X. Wan, G. Xu, Remaining useful life estimation for mechanical systems based on similarity of phase space trajectory, Expert Syst.
Appl. 42 (5) (2015) 2353–2360.
[51] P. Wang, B.D. Youn, C. Hu, A generic probabilistic framework for structural health prognostics and uncertainty management, Mech. Syst. Signal Process.
28 (2012) 622–637.
[52] R. Khelif, S. Malinowski, B. Chebel-Morello, N. Zerhouni, RUL prediction based on a new similarity-instance based approach, in: 2014 IEEE 23rd
International Symposium on Industrial Electronics (ISIE), 2014, pp. 2463–2468.
[53] F. Xue, P. Bonissone, A. Varma, W. Yan, N. Eklund, K. Goebel, An instance-based method for remaining useful life estimation for aircraft engines, J. Fail.
Anal. Prev. 8 (2) (2008) 199–206.
[54] J. Lam, S. Sankararaman, B. Stewart, Enhanced trajectory based similarity prediction with uncertainty quantification, in: PHM 2014 – Proceedings of
the Annual Conference of the Prognostics and Health Management Society 2014, 2013, pp. 623–634.
[55] W. Rezgui, N.K. Mouss, L.-H. Mouss, M.D. Mouss, M. Benbouzid, A regression algorithm for the smart prognosis of a reversed polarity fault in a
photovoltaic generator, in: 2014 First International Conference on Green Energy ICGE 2014, 2014, pp. 134–138.
[56] C. Hu, G. Jain, P. Zhang, C. Schmidt, P. Gomadam, T. Gorka, Data-driven method based on particle swarm optimization and k-nearest neighbor
regression for estimating capacity of lithium-ion battery, Appl. Energy 129 (2014) 49–55.
[57] Jianguang Zhao, Hongbo Li, Fanjing Zeng, Tiefeng Li, Prognostics of high frequency receiver based on evidential regression, in: Proceedings of the IEEE
2012 Prognostics and System Health Management Conference (PHM-2012 Beijing), 2012, no. 0, pp. 1–5.
[58] G. Chirici, A. Barbati, P. Corona, M. Marchetti, D. Travaglini, F. Maselli, R. Bertini, Non-parametric and parametric methods using satellite images for
estimating growing stock volume in alpine and Mediterranean forest ecosystems, Remote Sens. Environ. 112 (5) (2008) 2686–2700.
[59] A. Haara, A. Kangas, Comparing k nearest neighbours methods and linear regression – is there reason to select one over the other?, Math Comput. For.
Nat. Sci. 4 (1) (2012) 50–65.
[60] R.E. McRoberts, E.O. Tomppo, A.O. Finley, J. Heikkinen, Estimating areal means and variances of forest attributes using the k-Nearest Neighbors
technique and satellite imagery, Remote Sens. Environ. 111 (4) (2007) 466–480.
[61] S. Li, Z. Shen, G. Xiong, A k-nearest neighbor locally weighted regression method for short-term traffic flow forecasting, in 2012 15th International IEEE
Conference on Intelligent Transportation Systems, 2012, pp. 1596–1601.
[62] Tao Zhang, Lifang Hu, Zhixin Liu, Yuejie Zhang, Nonparametric regression for the short-term traffic flow forecasting, in: 2010 International Conference
on Mechanic Automation and Control Engineering, 2010, pp. 2850–2853.
[63] Z.-W. Yuan, Y.-H. Wang, Research on K nearest neighbor non-parametric regression algorithm based on KD-tree and clustering analysis, in: 2012
Fourth International Conference on Computational and Information Sciences, 2012, vol. 1, pp. 298–301.
[64] I.T. Jolliffe, Principal Component Analysis, second ed., Springer-Verlag, New York, 2002.
[65] S. Bersimis, S. Psarakis, J. Panaretos, Multivariate statistical process control charts: an overview, Qual. Reliab. Eng. Int. 23 (5) (2007) 517–543.
[66] T. Kourti, Application of latent variable methods to process control and multivariate statistical process control in industry, Int. J. Adapt. Control Signal
Process. 19 (4) (2005) 213–246.
[67] P. Nomikos, J.F. MacGregor, Multivariate SPC charts for monitoring batch processes, Technometrics 37 (1) (1995) 41–59.
[68] B.D.E. Ketelaere, M.I.A. Hubert, E. Schmitt, Overview of PCA-based statistical process-monitoring methods for time-dependent, high-dimensional data,
J. Qual. Technol. 47 (4) (2015) 318–335.
[69] U. Kruger, L. Xie, Statistical Monitoring of Complex Multivariate Processes, John Wiley & Sons Ltd, Chichester, UK, 2012.
[70] T. Kourti, J.F.J.F. MacGregor, Process analysis, monitoring and diagnosis, using multivariate projection methods, Chemom. Intell. Lab. Syst. 28 (1995) 3–
21.
[71] D.C. Montgomery, G.C. Runger, Applied Statistics and Probability for Engineers, third ed., John Wiley & Sons Inc., 2003.
[72] T. Kohonen, The self-organizing map, Neurocomputing 21 (1–3) (1998) 1–6.
[73] F. Wu, T. Wang, J. Lee, An online adaptive condition-based maintenance method for mechanical systems, Mech. Syst. Signal Process. 24 (8) (2010)
2985–2995.
[74] L.F. Gonçalves, J.L. Bosa, T.R. Balen, M.S. Lubaszewski, E.L. Schneider, R.V. Henriques, Fault detection, diagnosis and prediction in electrical valves using
self-organizing maps, J. Electron. Test. Theory Appl. 27 (4) (2011) 551–564.
[75] J. Vesanto, J. Himberg, E. Alhoniemi, J. Parhankangas, SOM Toolbox for Matlab 5, 2000.
[76] T. Kohonen, The self-organizing map, Proc. IEEE 78 (9) (1990) 1464–1480.
[77] T. Kohonen, Self-organized formation of topologically correct feature maps, Biol. Cybern. 43 (1) (1982) 59–69.
[78] T. Kohonen, E. Oja, O. Simula, A. Visa, J. Kangas, Engineering applications of the self-organizing map, Proc. IEEE 84 (10) (1996) 1358–1384.
[79] R. Xu, D. WunschII, Survey of Clustering Algorithms, IEEE Trans. Neural Networks 16 (3) (2005) 645–678.
[80] G. Niu, F. Qian, B.-K. Choi, Bearing life prognosis based on monotonic feature selection and similarity modeling, Proc. Inst. Mech. Eng. Part C J. Mech.
Eng. Sci. (2015) 1–11.
[81] M.J. Mcghee, G. Galloway, V.M. Catterson, B. Brown, E. Harrison, Prognostic Modelling of Valve Degradation within Power Stations, in: PHM 2014 –
Proceedings of the Annual Conference of the Prognostics and Health Management Society 2014, 2013, pp. 70–75.
[82] Fang Qian, Gang Niu, Remaining useful life prediction using ranking mutual information based monotonic health indicator, in: 2015 Prognostics and
System Health Management Conference (PHM), 2015, pp. 1–5.
_
[83] L. Györfi, M. Kohler, A. Krzyzak, H. Walk, A Distribution-Free Theory of Nonparametric Regression, Springer, New York, New York, NY, 2002.
[84] N.S. Altman, An introduction to Kernel and nearest-neighbor nonparametric regression, Am. Stat. 46 (3) (1992) 175–185.
[85] J.M. Ver Hoef, H. Temesgen, A comparison of the spatial linear model to nearest neighbor (k-NN) methods for forestry applications, PLoS One 8 (3)
(2013) e59129.
[86] X. Tian, Z. Su, E. Chen, Z. Li, C. van der Tol, J. Guo, Q. He, Estimation of forest above-ground biomass using multi-parameter remote sensing data over a
cold and arid area, Int. J. Appl. Earth Obs. Geoinf. 14 (1) (2012) 160–168.
[87] H. Gu, L. Dai, G. Wu, D. Xu, S. Wang, H. Wang, Estimation of forest volumes by integrating Landsat TM imagery and forest inventory data, Sci. China Ser.
E Technol. Sci. 49 (S1) (2006) 54–62.
[88] H. Sun, H. Liu, H. Xiao, R. He, B. Ran, Short term traffic forecasting using the local linear regression model, Transp. Res. Rec. 1836 (2003) 143–150.
[89] S. Arlot, A. Celisse, A survey of cross-validation procedures for model selection, Stat. Surv. 4 (2010) 40–79.
[90] L. Ljung, System Identification ToolboxTM User b€TM s Guide. The MathWorks, Inc., 2015.
[91] A. Saxena, J. Celaya, E. Balaban, K. Goebel, B. Saha, S. Saha, M. Schwabacher, Metrics for evaluating performance of prognostic techniques, in: 2008
International Conference on Prognostics and Health Management, 2008, pp. 1–17.
[92] Statistics and Machine Learning Toolbox User’s Guide. The MathWorks, Inc., 2016.
[93] ‘‘SOM Toolbox 2.0.” [Online]. Available: http://www.cis.hut.fi/projects/somtoolbox/.
[94] Bioinformatics ToolboxTM User b€TM s Guide. The MathWorks, Inc., 2015.

Abrupt Fault Remaining Useful Life Estimation Using Measurements From A Reciprocating Compressor Valve Failure

Uploaded by

Copyright:

Available Formats

Abrupt Fault Remaining Useful Life Estimation Using Measurements From A Reciprocating Compressor Valve Failure

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Abrupt Fault Remaining Useful Life Estimation Using Measurements From A Reciprocating Compressor Valve Failure

Uploaded by

Copyright:

Available Formats

Mechanical Systems and Signal Processing 121 (2019) 359–372

Contents lists available at ScienceDirect

Mechanical Systems and Signal Processing

Abrupt fault remaining useful life estimation using

Similarly, there are two ways for calculating RUL [7,21,22]:

2. Prognostics methods literature review

3. Prognostics methods overview

3.1. Health indicator creation

3.1.1. Principal components analysis (PCA)

3.1.2. Statistical process control (SPC)

3.2. Prognostics methods

3.2.1. Multiple linear regression (MLR)

yi ¼ b0 þ b1 xi1 þ . . . þ bp xip þ ei ; ð6Þ

3.2.2. Polynomial regression (PR)

3.2.3. Self-organising map (SOM)

3.2.4. Proposed variation of SOM based RUL estimation

3.2.5. K-nearest neighbours regression (KNNR)

with i ¼ 1; . . . ; n. u’s response value is [56,59,85]:

3.2.6. Proposed variation of KNNR based RUL estimation

3.2.7. Ensemble method

Ensemble ¼ ðMSR þ PR þ SOM1 þ SOM2 þ KNNR1 þ KNNR2Þ=6; ð18Þ

4.1. Data preparation

4.2. Prognostics metrics

i. Normalised mean square error (NMSE) [90]:

ii. Mean absolute percentage error (MAPER) [36]:

Fig. 1. T 2 and Q health indicators.

iii. Cumulative relative accuracy (CRA) [36]:

With RAðiÞ the Relative Accuracy at each time instance [36,91]:

5.1. Application of prognostics methods

5.1.1. Multiple linear/polynomial regression

5.1.2. Self-organising map

5.1.3. K-nearest neighbours regression

R2 0,68 0,67 0,76 0.77

Fig. 2. RUL estimation for failure case 8.

Fig. 3. RUL estimation for failure case 11.

Performance metrics NMSE MAPER CRA MAD

You might also like