Transportation Research Part D: Jungmok Ma, Harrison M. Kim

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Transportation Research Part D 38 (2015) 125143

Contents lists available at ScienceDirect

Transportation Research Part D


journal homepage: www.elsevier.com/locate/trd

Predictive usage mining for life cycle assessment


Jungmok Ma a, Harrison M. Kim b,
a
b

Department of National Defense Science, Korea National Defense University, Seoul, Korea
Department of Industrial and Enterprise Systems Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA

a r t i c l e

i n f o

Article history:
Available online 10 June 2015
Keywords:
Life cycle assessment
Usage modeling
Time series segmentation
Time series analysis

a b s t r a c t
The usage modeling in life cycle assessment (LCA) is rarely discussed despite the
magnitude of environmental impact from the usage stage. In this paper, the usage modeling technique, predictive usage mining for life cycle assessment (PUMLCA) algorithm, is
proposed as an alternative of the conventional constant rate method. By modeling usage
patterns as trend, seasonality, and level from a time series of usage information, predictive
LCA can be conducted in a real time horizon, which can provide more accurate estimation
of environmental impact. Large-scale sensor data of product operation is suggested as a
source of data for the proposed method to mine usage patterns and build a usage model
for LCA. The PUMLCA algorithm can provide a similar level of prediction accuracy to the
constant rate method when data is constant, and the higher prediction accuracy when data
has complex patterns. In order to mine important usage patterns more effectively, a new
automatic segmentation algorithm is developed based on change point analysis. The
PUMLCA algorithm can also handle missing and abnormal values from large-scale sensor
data, identify seasonality, and formulate predictive LCA equations for current and new
machines. Finally, the LCA of agricultural machinery demonstrates the proposed approach
and highlights its benets and limitations.
2015 Elsevier Ltd. All rights reserved.

Introduction and background


Life cycle assessment (LCA) is an analytical assessment tool to quantify environmental impact of a product or system
(Rebitzer et al., 2004; Finnveden et al., 2009). The potential environmental impact can be generated from all the stages of
a product, i.e., manufacturing, usage, maintenance, and end-of-life. The LCA approach provides a holistic and systematic
way to manage data associated with the product of interest. With the popularity of sustainable design and environmentally
conscious design, LCA studies have reported various materials, electronics, automobiles, and complex systems (Kwak, 2012).
The LCA framework (Guine, 2002; Reap et al., 2008a) consists of goal and scope denition, inventory analysis (LCI, life
cycle inventory), impact assessment (LCIA, life cycle impact assessment) and interpretation. The goal and scope denition is
the phase that denes the purpose, the systems or products, and the level of sophistication. The LCI is the phase that denes
the system boundaries and the ow diagrams with unit processes (e.g., extraction of oil, rening, production of electricity,
etc.). The main result from the LCI is the inventory table which quanties inputs (e.g., raw material, land, energy, etc.) and
outputs (e.g., pollutants such as CO2 ; SO2 ; NOx , etc.) to the environment. The LCIA is the phase that translates the inventory
table into relevant impact categories (e.g., carcinogens, climate change, acidication, etc.) and quanties the environmental

Corresponding author at: 104 S. Mathews Ave., Urbana, IL 61801, USA. Tel.: +1 (217) 265 9437; fax: +1 (217) 244 5705.
E-mail address: [email protected] (H.M. Kim).
http://dx.doi.org/10.1016/j.trd.2015.04.022
1361-9209/ 2015 Elsevier Ltd. All rights reserved.

126

J. Ma, H.M. Kim / Transportation Research Part D 38 (2015) 125143

impact using weighting and normalization. The interpretation is the phase that evaluates the results from the LCIA and
makes recommendations of the LCA study.
Although the LCA approach is mature and has become a widely used method in various industries, it is usually static in
that time is not considered in the assessment with the implicit assumption of steady-state processes. The necessity of considering time in LCA was discussed in literature. Reap et al. (2008b) provided insightful reviews on the temporal aspects of
LCA. Temporal factors such as different rates of emissions over time and seasonal variation of their impacts can inuence the
accuracy of LCA. Levasseur et al. (2010) showed that the inconsistency in time frames can affect LCA results signicantly.
Memary et al. (2012) demonstrated that changes of environmental impact over time are useful information for assessing
future technology and options. Collet et al. (2014) presented a method to nd the most critical ows of information based
on dynamic inventory data (i.e., LCI level) and sensitivity analysis. In addition to the aspect of time, spatial variation is
another contributor that can signicantly affect the accuracy of LCA (Reap et al., 2008b). Local, regional and continental
differences can cause different results of LCA.
In this paper, a new perspective of dynamic LCA is proposed to consider time in LCA, especially the modeling of the usage
stage. Among the life cycle stages of a product, the manufacturing stage, which is the chosen stage in the majority of LCA
studies, can be considered as a one-time event, i.e., time-independent event. Although the dynamic inventory approach
(Collet et al., 2014) attempted to relax this (e.g., the impact from material x or process y can be changed over time), the
inventory data is considered constant in this study. On the other hand, the usage stage (with maintenance and end-of-life
stages) is a time-dependent event, which means the lifespan of a product has a large impact on LCA. Many studies showed
that the majority of environmental impact can come from the usage stage over life cycle (e.g., more than 60% for cars
(Sullivan and Cobas-Flores, 2001), more than 80% for off-load machinery (product of interest in this paper) (Kwak et al.,
2012), and 8090% for some small electronics (Telenko and Seepersad, 2014)). Therefore, how to model the usage stage in
LCA is critical and one of the main questions of this work.
Even though the importance of the usage modeling has been recognized among LCA researchers and practitioners, it is
rarely discussed in literature. LCA studies in literature usually utilized a constant rate (Lee et al., 2000; Choi et al., 2006;
Kwak et al., 2012; Kwak and Kim, 2013; Li et al., 2013) of usage information (hereinafter constant rate method) with the
implicit assumption of steady-state processes (e.g., average fuel consumption rate in kg/h, xed operating hours per month,
etc.). This method is simple and easy to apply, but if data has complex patterns (e.g., trend, seasonality and segments), the
prediction accuracy of the constant rate method can be signicantly reduced. The constant rate method only allows us to
calculate life cycle impact in a nominal time horizon, e.g., 10 years as a whole instead from October 2014 to December
2024. This can be an important issue to policy makers and manufacturers when they want to estimate the environmental
impact of the future. Fig. 1 shows the expected result from both the proposed model and the constant rate method.
Based on the available historical data, a usage (e.g., diesel fuel consumption) model should be built and used for predicting
the future usage prole. It can be seen in Section Numerical prediction tests for PUMLCA that the constant rate method can
misinterpret the upcoming usage prole while the proposed model is expected to provide higher prediction accuracy with
lower variance predictions.
One exception is Telenko and Seepersad (2014) who proposed a usage context modeling technique in LCA using Bayesian
network models. The usage context includes human, situational, and product variables. Based on a pre-dened probabilistic
network of relevant usage patterns (e.g., weather ? usage of electric kettle with probability of x), a usage prole and its variability can be modeled as a form of distribution. However, in order to apply this approach, causal relationships among different usage contexts should be known, which is expressed as a probabilistic network. For example, the usage of agricultural
machinery (e.g., crop sprayer, harvester, nutrient applicator, etc.) can be affected by a various usage context (e.g., weather,
soil, experience of farmers, price of fuel and crops, machine deterioration). It will be difcult to correlate these variables with
specic usage information (e.g., diesel fuel consumption and operating hours). Furthermore, Telenko and Seepersad (2014)
did not consider time in LCA.
Alternatively, this study proposes a time series usage modeling technique, predictive usage mining for life cycle assessment (PUMLCA), as shown in Fig. 2. Companies such as Caterpillar (PRODUCT Link) and John Deere (JDLink) have

Fig. 1. A prediction scenario of PUMLCA and constant rate method.

J. Ma, H.M. Kim / Transportation Research Part D 38 (2015) 125143

127

Fig. 2. Overview of PUMLCA.

developed telematics systems for their machinery and have been gathering operational data in real time for various purposes: asset utilization monitoring, location tracking, eet management, machine health prognostics, etc. These
large-scale time-stamped data sets are the sources of data for the PUMLCA algorithm. Usually, the whole picture of a usage
prole is not available for currently deploying machines or new machines. Based on the limited past information, future
usage patterns should be predicted for LCA as shown in Fig. 1. Time series analysis is useful when future values should
be predicted while explanatory variables are difcult to identify. By modeling time series usage information, not only can
future usage patterns be obtained, but also variability (i.e., prediction interval). For example, Ma et al. (2014) and Ma and
Kim (2014) showed that a trend of valuable information (demand and price) could be mined and reected in system design
using the combination of time series analysis and data mining.
Time series usage information, however, frequently shows highly seasonal activity periods with periodic no-activity or
very low-activity periods. For example, combine harvesters are mainly operated during the harvest season with almost zero
usage during the off-season. A similar pattern can be observed from seasonally used machinery. This pattern is also widespread for time series data of highly seasonal items such as Christmas, Easter and Halloween products. When analyzing
and modeling this kind of time series data, a segmentation can help to nd usage patterns more clearly by grouping distinct
periods (e.g., off-season period) (Jackson, 2010). Segmentation algorithms (Keogh et al., 2004) were proposed for various

(A)

(B)

Fig. 3. Time series segmentation (A) piecewise linear representation (redrawn from Keogh et al. (2004)) and (B) segmentation for prediction (redrawn from
Hyndman and Athanasopoulos (2013)).

128

J. Ma, H.M. Kim / Transportation Research Part D 38 (2015) 125143

applications such as voice recognition, handwriting recognition, clustering, and classication. However, not much has been
reported in the LCA literature whether segmentation algorithms can improve predictive capability. Fig. 3 shows the example.
The usual time series segmentation (A (Electrocardiogram) in the gure, piecewise linear representation) is used for the
approximation of a time series but the proposed segmentation (B (Monthly sales for a souvenir shop in Queensland,
Australia) in the gure, dotted lines for predicted values) is designed to improve the predictive capability of time series
modeling by grouping distinct periods and magnifying important patterns (e.g., r and  segments are separated and
predicted, s segments are regrouped with the predicted values). Therefore, how to segment a time series for better LCA results
is another main question of this work.
The main contribution of this study is to propose the usage modeling technique, predictive usage mining for life cycle
assessment (PUMLCA) algorithm, which enables predictive LCA in a real time horizon. The PUMLCA algorithm can provide
a similar level of prediction accuracy to the constant rate method when data is constant, and a higher prediction accuracy
when data has complex patterns. In order to mine important usage patterns (trend, seasonality and level) effectively from a
time series, a new automatic segmentation algorithm is developed based on change point analysis. The PUMLCA algorithm
can also handle missing and abnormal values from large-scale sensor data, identify seasonality, and formulate predictive LCA
equations for current and new machines. Finally, the LCA of agricultural machinery demonstrates the proposed approach and
highlights its benets and limitations.
The rest of the paper is organized as follows: Section Description of predictive usage mining for life cycle assessment
algorithm describes the PUMLCA algorithm. Section Design problems with PUMLCA provides design problems for current
and new machines. Numerical prediction tests are presented for PUMLCA and the constant rate method in Section Numer
ical prediction tests for PUMLCA followed by a case study of agricultural machinery in Section Case study: agricultural
machinery. The benets and limitations of the proposed methodology along with future research directions are discussed
in Section Closing remarks and future work.
Description of predictive usage mining for life cycle assessment algorithm
Fig. 4 outlines the predictive usage mining for life cycle assessment (PUMLCA) algorithm. There are ve stages: data
preprocessing for handling missing and abnormal values, seasonal period analysis, segmentation analysis, time series
analysis, and predictive LCA. Details are explained in each subsection respectively. The algorithm starts from gathering
time-stamped sensor data sets with usage information of interest. The amount of fuel (or energy) consumption and
operating hours by work modes (e.g., idling and non-idling) are selected as the usage information. In this paper, the
usage information is viewed as a result of interactions among human, situational and product variables which are the
components of the usage context (Telenko and Seepersad, 2014). For example, the amount of fuel consumed by work
modes can be affected by user experience and preference (human variables), weather and soil (situational variables),
and machine deterioration and efciency (product variables). The patterns of the usage information (usage patterns)
are dened as trend, seasonality and level in historical time series data. A trend is a long-term increase or decrease pattern; a seasonality is a repeated pattern with a xed and known period; and a level is base values after removing trend
and seasonality. Since a level can be considered as an initial value with a series of random errors, trend and seasonality
are the two main patterns that will be mined.
Data preprocessing
After collecting a time series of usage information of interest, it should be checked whether there are missing or
abnormal values. Though it is assumed that the error rate of sensor data is very low and the incompleteness of data
happens at random, it is still possible to have missing or abnormal values. In order to handle missing values (usually
indicated as not available), various imputation techniques are available: (1) removing the missing values, (2) replacing
the missing values with random values, adjacent values, mean or median, and (3) replacing the missing values based on
values of a correlated variable. Since the volume of collected data is very large, any aforementioned method can be
applied.
Unlike missing values, abnormal values (or outliers) are difcult to dene. However, similar to the case of missing values,
it is assumed that the sample size of abnormal values is much smaller than the volume of the original data and abnormal
values are not generated systematically. There are two approaches: (1) three-sigma rule and (2) boxplot. The three-sigma
rule states that approximately 99.73% of values lie within three standard deviations of the mean in Gaussian distribution.
In other words, if the collected values (yt ) are considered random variables following the Gaussian distribution, abnormal
values can be dened as values located outside of Eq. (1):

l  3r 6 yt 6 l 3r

where l is the mean and r is the standard deviation.


Another method to detect abnormal values is a boxplot. Abnormal values are dened as values located outside of Eq. (2):

Q 1  1:5IQR 6 yt 6 Q 3 1:5IQR

J. Ma, H.M. Kim / Transportation Research Part D 38 (2015) 125143

129

Fig. 4. Overall framework of PUMLCA.

where Q 1 is the 25th percentile, Q 2 is the median or 50th percentile, Q 3 is the 75th percentile, and IQR refers to the
interquartile range (Q 3  Q 1 ). If data is distributed as the Gaussian distribution, Eq. (2) can be expressed as l  2:7r.
Fig. 4 indicates that detected abnormal values are removed and handled by techniques for missing values.
Seasonal period analysis
The next step is to determine whether there are seasonal patterns, and if there is, what the length (period) of seasonality
is. It should be noted that seasonality modeling will be conducted in Section Time series analysis, but without the information of the seasonal period, seasonality cannot be modeled. Examples of typical periods include 24 for an hourly series, 7 for a
weekly series, 12 for a monthly series, 4 for a quarterly series, etc. If a seasonal period is known, the information can be used.
If it is not known, then a dominant period should be identied with different seasonal representations of the original sensor
data.
A periodogram (Shumway and Stoffer, 2011) is suggested to identify the important seasonal period. The periodogram is a
plot with the x-axis of frequencies and the y-axis of periodogram values. The periodogram value is a sample spectral density,
which can give the relative importance of frequencies. The mathematical expression of the periodogram values are dened
as (Shumway and Stoffer, 2011):

130

J. Ma, H.M. Kim / Transportation Research Part D 38 (2015) 125143

  " X
   #2 " X
   #2
j
2 n
j
2 n
j

t
t
P
y cos 2p

y sin 2p
n
n t1 t
n
n t1 t
n

where yt is a time series with n discrete time points and j=n are frequencies (j cycles in n time points) for j 1; 2; . . . ; n=2.
The dominant period (i.e., reciprocal of a frequency j=n) can be identied by arg max Pj=n.
One helpful treatment before plotting a periodogram is detrending time series usage information (i.e., remove a trend).
Two possible methods of detrending will be presented in Section Time series analysis. Also, from a practical standpoint,
users can limit a range of frequencies as a meaningful range by their denition.
Segmentation analysis
There are two types of segmentation analysis: deterministic and automatic. Deterministic segmentation analysis can be
used when some segments of given time series data show deterministic patterns, e.g., zero usages over time within specic
periods. If this prior knowledge is not available or patterns are not deterministic with variable periods, automatic segmentation analysis should be applied. In this paper, a new automatic segmentation algorithm with the change point analysis is
presented.
Fig. 5 shows the schematic of the automation segmentation algorithm. A period n=j is identied from Section Seasonal
period analysis and the number of data points n is proportional to the period (i.e., n=j jp=j p). For each period, there are p
time indexes, m1 ; m2 ; . . . ; mp . For example, a period 12 has 12 time indexes which are January, February, . . . , December. The
goal of this algorithm is to nd a shared segment (SS) over periods. spj denotes a segment, which is a set of p time indexes in
the period pj . The segment does not contain any change point.
Change point analysis is a statistical technique that can detect multiple change points within a time series (Killick et al.,
2012). When a discrete time series, y1:n fy1 ; . . . ; yn g, is given, positions of change points, s1:m (s0 0 and sm1 n) can be
identied if the statistical properties of y1:s1 ; ys1 1:s2 ; . . . ; ysm 1:n are different in some sense. In this paper, changes in mean are
adopted, although changes in variance are another option. In order to identify change points, an objective function is given
by Killick et al. (2012):

Fn min

(
m1
X

)
4

Cysi1 1:si b

i1

where C is a cost function for a segment and b is a penalty. For t < n, a recursive expression can be determined as follows
(Killick et al., 2012) and solved in turn by dynamic programming:

m
X
Fn min min Cysi1 1:si b Cyt1:n b
t

s2s1:t

i1

)
minfFt Cyt1:n bg
t

A pruned exact linear time (PELT) method (Killick et al., 2012) was proposed to solve Eq. (5) more efciently with a pruning procedure instead of searching all possible change points. During iterations for t < s < n, only a set of t satisfying Eq. (6)
will be considered:

Ft Cyt1:s K 6 Fs

where K is a constant.
As a cost function, the negative of maximum log-likelihood is used, which is given by Killick et al. (2012):

Fig. 5. A schematic of automatic segmentation algorithm.

J. Ma, H.M. Kim / Transportation Research Part D 38 (2015) 125143

Cyt1:n max
h

n
X

log f yi jh

131

it1

where f yi jh is a density function with the parameter h for a segment.


As a penalty, there are some options such as Akaikes Information Criterion (AIC, b 2p) and Bayesian Information
Criterion (BIC, b p logn), where p is the number of added parameters for a change point. It is also possible to specify a
type I error (e.g., 0.05 or 0.01) as a penalty value using an asymptotic distribution (Killick and Eckley, 2011). The PELT algorithm is implemented in R (Killick and Eckley, 2011).
The automatic segmentation algorithm based on the change point analysis (i.e., PELT) is provided in Algorithm 1. The goal
of this algorithm is to nd shared segments over seasonal periods which contain no change point. Unlike the PELT algorithm,
change points will be identied within a seasonal period. A penalty, b, should be selected by users. As the penalty value
increases, less change points will be identied and the algorithm will be less sensitive over close values. A segment is dened
as a group of members within a seasonal period. At least two members are required to be a segment (e.g., y1:3 in line 12). In
line 4, s contains the possible positions of change points, which are p time indexes within each period (e is the indexes of
periods). In lines 58 (Killick et al., 2012), the PELT algorithm is implemented with the pruning procedure in line 8. Rs is the
set of s ; s0 is the identied optimal position of change points; CP e denotes the optimal positions of change points (s ) for
each period, which is the result of the rst part of the algorithm in line 10. Line 12 makes a set of segments, Se , for each period
based on the identied optimal change points (CP e ). Note that s1:mp s1 ; . . . ; smp . Line 13 nds shared segments (SS) over
different periods. At this point, it is possible that change points can exist among the sets, Se , in the shared segments, which
indicates that those segments are not similar patterns that repeat periodically. Line 14 makes one new time series (NS) using
shared segments of each period (e.g., SSp1 represents the shared segment of the rst period). Line 15 applies the PELT method
for the new series with no period and a new change point set, CP0 , is returned in line 16. The output depends on the new
change point set. If there is no change point, the shared segments and the remaining data are grouped as different time series. If there is a change point, no segmentation will be implemented.
Algorithm 1. Automatic segmentation
A time series, y1:n with n number of data points
A seasonal period, p, where p = n/j with j cycles
A measure of t C() and a penalty b
for s 1; . . . ; mp and e p1 ; p2 ; . . . ; pj do
h
5: Calculate Fs min fFs Cys1:s bg

s
2R
s
h
6: Let s0 arg min fFs Cys1:s bg

1:
2:
3:
4:

s2Rs

7: Set CP e s cps0 ; s0 
8: Set Rs 1 fs 2 Rs [ fs g : Fs Cys1:s K 6 Fs g
9: end for
10: return CP p1 ; CP p2 ; . . . ; CP pj
11: for e p1 ; p2 ; . . . ; pj do
12:

Set Se fy1:s ; ys 1:s ; . . . ; ys

13:

Find SS fSp1 \ Sp2 \    \ Spj g

14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:

mp 1

1:smp g

Let NS fSSp1 ; SSp2 ; . . . ; SSpj g


Apply line 4  9 to NS
Get CP0 s
end for
return
if CP0 s = null then
group SS as one time series and remaining as another time series
number of time series (s) = z
else
no segmentation, s 1 (i.e., original data)
end if

Based on the result of the automatic segmentation algorithm, time series analysis methods in the next section will be applied
to each segmented time series. Now, each time series has a new period, which is the number of seasonal time indexes.
Time series analysis
Time series analysis includes modeling time series data by extracting important patterns and forecasting future values
from the tted model. The two most widely used time series analysis techniques (Hyndman and Athanasopoulos, 2013)

132

J. Ma, H.M. Kim / Transportation Research Part D 38 (2015) 125143

are adopted in this paper: exponential smoothing (ETS) and autoregressive integrated moving average (ARIMA). Since each
has its strengths and weaknesses (Hyndman and Khandakar, 2008), either method can be selected by users. Observations
^thjt where h is a real time
are denoted by yt and a forecast of h ahead time based on all the data up to time t is denoted by y
horizon.
Exponential smoothing
The ETS models refer to an exponential smoothing family (e.g., simple exponential smoothing, Holts linear trend model,
Holt-Winters seasonal model, etc.) based on the innovations state space framework (Hyndman et al., 2008). The ETS model
identies key components of a time series (trend and seasonality) and expresses their relationships (additive and multiplicative) using exponential smoothing.
The simplest model of ETS is given as:

^t1 y
^t ayt  y
^t
y

where a is a parameter between zero and one. Eq. (8) represents that the new forecast is the combination of the old forecast
and the error from the last forecast. Similar to Eq. (8), there are 30 ETS models with a combination of trend (none, additive,
additive damped, multiplicative and multiplicative damped), seasonality (none, additive and multiplicative) and error (additive and multiplicative) (Hyndman et al., 2008).
All the 30 ETS models can be expressed as innovations state space models and the general model is given as (Hyndman
et al., 2008):

yt wxt1 rxt1 t

xt f xt1 gxt1 t

10

where xt is the state vector which contains unobserved components such as the level, trend, and seasonality of a time series;
w and r are scalar functions; f and g are the vector functions; and t is the white noise process with variance r2 . The
white noise process is a process that has zero mean, constant and nite variance, and uncorrelated series. Using this innovations state space framework, Hyndman et al. (2008) showed that prediction interval can be obtained along with a point
forecast.
^thjt , a recursive expression was summarized as follows (Hyndman et al., 2008):
In order to get a forecast, y

^tjt1 wxt1
y
t yt  y^tjt1 =rxt1

11
12
13

xt f xt1 gxt1 t

Then, a simulation approach (Hyndman and Khandakar, 2008) can be used to simulate t for a forecast with a prediction
interval.
The remaining part is the identication of trend and seasonality, which is called as the decomposition of a time series.
First, the trend component can be estimated (T^ t ) by a moving average smoothing. The moving average smoothing of order
m is given by Hyndman and Athanasopoulos (2013):
k
1X
T^ t
y
m jk tj

14

where m 2k 1. The order of the moving average smoothing is a seasonal period, and if the seasonal period is not known,
usually odd orders (e.g., 3, 5, 7, 9, etc.) can be applied (Hyndman and Athanasopoulos, 2013). A larger order gives a smoother
t. Then, detrended time series data can be obtained as y  T^ t for the additive model or y =T^ t for the multiplicative model. It
t

should be noted that this is one method to obtain a detrended series for the seasonal period analysis in Section Seasonal
period analysis.
Second, the seasonal component can be estimated from detrended series data. An average of each seasonal time index
St .
over seasonal periods (e.g., all values in January for monthly data) gives the seasonal component, ^
ARIMA
While the ETS model represents a time series as exponential smoothing of trend and seasonality, the ARIMA model is
based on autocorrelations in the time series. The ARIMA model (without seasonality) is a combination of three models given
as (Hyndman and Athanasopoulos, 2013):

1  /1 B      /p Bp 1  Bd yt c 1 hB    hq Bq et

15

where the rst parenthesis is an autoregressive (AR) model of order p, the second parenthesis is an integration (or differencing operation), and the third parenthesis on the right-hand side is a moving average (MA) model of order q. B represents a
backward shift operator, e.g., Byt yt1 .

J. Ma, H.M. Kim / Transportation Research Part D 38 (2015) 125143

133

The AR model of order p is given by:

yt c /1 yt1 /2 yt2    /p ytp et

16

where c is a constant and et is white noise. This is a linear combination of past observations.
The differencing operation of order 1 and order 2 is given as:

y0t yt  yt1
y00t y0t  y0t1

17
18

The determination of differencing can be made by statistical inference called unit root tests (Hyndman and
Athanasopoulos, 2013). It should be noted that this is another method for detrending time series data for the seasonal period
analysis in Section Seasonal period analysis.
The MA model of order q is given as:

yt c et h1 et1 h2 et2    hq etq

19

This is a linear combination of past forecast errors.


Finally, seasonal ARIMA model can be written as (Hyndman and Athanasopoulos, 2013):
D

1  /1 B      /p Bp 1  U1 Bm      UP BPm 1  Bd 1  Bm yt
c 1 h1 B    hq Bq 1 H1 Bm    HQ BQm et

20

where lower-case letters p; d, and q are orders for non-seasonal AR, integration, and MA models; upper-case letters P, D, and
Q are orders for seasonal AR, integration, and MA models; and m is a period.
In order to forecast future values based on a tted ARIMA model, Eq. (20) can be expanded so that only yt will be shown
^thjt , a recursive expression can be solved for a forecast of h ahead time.
on the left-hand side. By rewriting it as y
Close observation for both ETS and ARIMA models reveals similarities. The ETS model starts identifying trend and seasonality and the ARIMA model uses the differencing operation to remove trend and seasonality (i.e., stationarity). The ETS then
expresses a series using past level, trend and seasonality with exponentially decreasing weights while the ARIMA expresses a
series using past observations and forecast errors.
Automatic modeling of ETS and ARIMA
As shown previously, the ETS and ARIMA require parameter estimation and model selection. Hyndman and Khandakar
(2008) provided an automatic forecasting algorithm to handle a large number of univariate time series data. The algorithm
is implemented in R package forecast. This section briey introduces the automatic forecasting algorithm for the ETS and
ARIMA models.
The automatic forecasting algorithm for the ETS models can be summarized as follows: (1) apply all 30 models and optimize parameters of each model, (2) select the best model based on a penalized likelihood such as AIC and BIC, and (3) forecast future values and obtain prediction intervals based on the selected model.
The automatic forecasting algorithm for the ARIMA can be summarized as follows: (1) apply four possible models and
select the best model based on a penalized likelihood, (2) apply 13 variations on the current model and repeat the process
if a better model can be identied based on a penalized likelihood, and (3) forecast future values and obtain prediction intervals based on the selected model. Details of these algorithm can be found in the work of Hyndman and Khandakar (2008).
Predictive life cycle assessment
The difference between predictive LCA and original LCA is to model the usage stage (with maintenance and end-of-life
stages) as a time series and to forecast future impact in a real time horizon. The total life cycle impact of a product can
be expressed as (Kwak and Kim, 2013):

Itotal Imfg Iusage Imaint Ieol


mfg

usage

maint

21

eol

where I ; I
;I
and I represent the impact of manufacturing, usage, maintenance, and end-of-life stage. In the equation, a constant fuel (or energy) consumption rate in the usage stage and replacement cycles in the maintenance stage are
components that are dependent upon the expected lifespan. However, the time in Eq. (21) is nominal, e.g., 10 years instead of
specifying a time horizon such as from October 2014 to December 2024.
Instead, Eq. (22) gives the total environmental impact in a real time horizon:
l
l h
i
X
X
Itotal Imfg
Iusage
Imaint
Ieol
i
i
ti

22

ti

where l is the expected life time starting from time i. The impact of manufacturing can be considered as a one-time event
while the impacts of usage, maintenance, and end-of-life are affected by time series usage information.

134

J. Ma, H.M. Kim / Transportation Research Part D 38 (2015) 125143

The impact of manufacturing is given as (Kwak and Kim, 2013):

Imfg

X
X
X
eraw
eprocess
Np
etrans
Ns
r Nr
p
s
r

23

process
where eraw
, and etrans
represent unit environmental impact of raw materials (r), manufacturing processes (p), and
r ; ep
s
transportation (s); N r ; N p , and N s denote the number of units of raw materials, manufacturing processes, and transportation.
The impacts of usage, maintenance, and end-of-life are given as:

l
l
l
l
l
X
X
X
X
XX
Iusage
Ifuel

Iemission
efuel Nft
eemission
ERq OHt
q
i
ti

ti

ti

ti

24

ti



l
l
XX
X
maxOHt  RC m ; 0
Imaint
emaint
Nm
m
RC m
m ti
ti


l
l
XX
X
maxOHt  RC m ; 0
eol
Ieol eeol

e
N
m
used
replace
RC m
m ti
ti

25
26

where Ifuel and Iemission are the impacts of fuel production as in Eq. (23) and emissions while running an equipment;
eol
efuel ; eemission
; emaint
; eeol
q
m
used , and ereplace are the unit impacts of fuel, emissions, manufacturing of maintenance part m as in Eq.
(23), and end-of-life processing of a used product and replaced part (m); N ft is the amount of fuel consumed per liter; N m
denotes the number of units of part m (in a product); ERq is the emission rate of emissions q in g/h; OHt is the operating time
in hours; RC m is the replacement cycle of part m in hours; de is the ceiling function. The value of a ceiling function will give
the number of replacements for part m. All the unit impacts can be obtained from the ecoinvent database (version 2.2),
which is available in the LCA software SimaPro. Note that this study only considers energy-related impacts (e.g., fuel and
electricity) of the usage stage, which is identied as the main contributors in literature. Other consumables are not considered, e.g., coffee and water for coffee machines, paper and ink for printers, etc. based on the scope of this study.
Section Description of predictive usage mining for life cycle assessment algorithm described the proposed algorithm
from data preprocessing to predictive LCA formulation. Note that the algorithm starts from the available time-stamped data
sets (top of Fig. 4) and it is not discussed how many data sets should be available for the algorithm. Empirical studies show
that if the available data is not enough to identify useful patterns (e.g., only a few data points), then the result from Secti
on Automatic modeling of ETS and ARIMA is identical with the constant rate method, which is smoothing by averaging
available data points. Actually, the constant rate method can be considered as a special case of the proposed time series analysis methods. In the next section, the proposed LCA formulation will be elaborated with design problems.

Design problems with PUMLCA


Two system design cases are considered in this study, which is shown in Fig. 6. The rst case, analysis for sustainability, is
when current machines need to be analyzed for sustainability. In this case, enough usage data is available with manufacturing, maintenance and end-of-life data. Life cycle information includes all the information from life cycle stages and the
expected lifespan or target time horizon.

Fig. 6. Two system design cases for predictive LCA.

135

J. Ma, H.M. Kim / Transportation Research Part D 38 (2015) 125143

The amount of fuel consumed, N ft , and operating hour, OHt , are the time series usage information. The tted models for N ft
N

f
and OHt from ARIMA or ETS are TStsf and TSOH
ts with the number of segments s in Algorithm 1. For example, TSts can be either
Eqs. (27) and (28), or Eq. (29):

Nfts wxt1 rxt1 t

27

xt f xt1 gxt1 t

28
D

1  /1 B      /p Bp 1  U1 Bm      UP BPm 1  Bd 1  Bm Nfts
c 1 h1 B    hq Bq 1 H1 Bm    HQ BQm et

29

The environmental impact of current machines can be predicted as follows based on Eqs. (23)(26):

Imfg

X
X
X
eraw
eprocess
Np
etrans
Ns
r Nr
p
s
r

l
l X
z
l X
z
X
XX
X
N
Iusage
efuel TStsf
eemission
ERq TSOH
q
ts
q

ti s1

ti

30

31

ti s1

&

l
l X
z
XX
X
maxTSOH
maint
maint
ts  RC m ; 0
I

em N m
RC m
m ti s1
ti
&

l
l
z
XXX
X eol
maxTSOH
eol
eol
ts  RC m ; 0
I eused
ereplace Nm
RC m
m ti s1
ti

32
33

The second case, design for sustainability, is for the assessment of the new machines sustainability when the target of
environmental impact reduction should be applied to current machines due to new environmental regulations and enforcement. In this case, it is assumed that the new machines are upgraded versions of current machines. For example, new machines can improve the fuel efciency with different materials or components. While these BOM (bill of materials) changes
might increase the environmental impact of the manufacturing stage, the efcient fuel usage can reduce the environmental
impact of the usage stage. As shown in Fig. 6, the main difference between the current machines and new machines is the
availability of usage data (or usage model). The proposed method for the estimation of usage information is to use the
improvement ratio which is dened as follows:

Nf =W unit new machine


Nf =W unit current machine
OH=W unit new machine

OH=W unit current machine

dNf

34

dOH

35

where dNf is the improvement ratio for the amount of fuel consumption, dOH is the improvement ratio for the operating
hours, and W unit is a unit of work. For example, if a new nutrient applicator can apply fertilizers with high precision and
speed, these can be expressed as dNf and dOH with the work unit of the square meter (m2 ) from testing data. Then, the sensor
data of current nutrient applicators can be used with the dNf and dOH as follows for the environmental impact of the new
machine:

Imfg

X
X
X
eraw
eprocess
Np
etrans
Ns
r Nr
p
s
r

l
l X
z
l X
z
X
XX
X
N
Iusage
efuel dNf TStsf
eemission
ERq dOH TSOH
ts
q
ti

ti s1

36

37

ti s1

&

l
l X
z
XX
X
maxdOH TSOH
maint
maint
ts  RC m ; 0
I

em N m
RC m
m ti s1
ti

3
2
l
l X
z
max dOH TSOH
X
XX
ts  RC m ; 0
eol
eol
eol
7
I eused
ereplace Nm 6
6
7
RC m
6
7
m ti s1
ti

38

39

The LCA result from Eqs. (36)(39) estimates the environmental impact of the new machine. The result can also show
whether the target of environmental impact reduction is satised. Otherwise, new design strategy should be explored.
Note that the two design cases can be viewed as phases of a single design case, i.e., evaluation of current sustainability
and redesign.

136

J. Ma, H.M. Kim / Transportation Research Part D 38 (2015) 125143

Numerical prediction tests for PUMLCA


In this section, a set of different data is tested to validate the prediction performance of PUMLCA. Due to the signicance
of environmental impact from the usage stage in LCA, the prediction accuracy of a time series usage model will play an
important role for the estimation of environmental impact. The conventional method to model the usage stage is the constant rate method, which is the average of observations. The hypotheses are (1) the PUMLCA algorithm can provide a similar
level of prediction accuracy to the constant rate method when data is constant with small random errors (i.e., steady-state
processes), hereinafter data 1, (2) the PUMLCA can predict future values more accurately than the constant rate method
when data has a trend, hereinafter data 2, (3) the automatic segmentation algorithm in PUMLCA can help to improve the
predictive modeling when data has a trend and segments, hereinafter data 3, and (4) the PUMLCA algorithm can provide
higher prediction accuracy than the constant rate method when prediction is required for specic periods within the whole
prediction horizon.
Data sets (data 1, 2, 3) with monthly seasonal patterns were generated and the procedures are described in Section Data
generation for the hypotheses (1), (2) and (3). The three types of data sets were also used to test the hypothesis (4). In terms
of the target of prediction, this study proposes to use not only the aggregated life cycle values (accuracy) but also the
seasonal values of time series usage information (variance) because different time horizon scenarios can be tested. For example, monthly usage data is used to predict the next two-year values and the accumulated two-year values can be used to
assess the environmental impact of life cycle as an accuracy measure. If the environmental impact of next quarter or specic
periods within two years is required to be estimated, the accuracy of the predicted seasonal values (i.e., monthly values) will
determine the quality of the analysis, which can be considered a variance measure. This is related to the fourth hypothesis.
Therefore, the best model should provide good predictions of both values: high accuracy (aggregated life cycle values) and
low variance (seasonal values).
As a prediction performance measure, mean absolute percentage error (MAPE) and mean absolute error (MAE) were used.
Eqs. (40) and (41) show MAPE and MAE with the predicted values, b1 ; b2 ; . . . ; bm and the real values, d1 ; d2 ; . . . ; dm . MAPE is
scale-independent so that results from different data sets can be compared. However, by design, if the actual values are close
to zero, MAPE cannot be dened. In this case, the scale-dependent measure, MAE, was used.

Mean Absolute Percentage Error







1
m
100 b1dd
    bmdd

m
1

m
jb1  d1 j    jbm  dm j
Mean Absolute Error
m

40
41

Note that based on MAPE and MAE, lower values of test results are preferable.
Throughout the numerical tests, only positive values were accepted as valid values. Negative values were set to zero. In
order to handle non-negative data, one common method is the BoxCox transformations (Hyndman and Athanasopoulos,
2013), which includes logarithms and power transformations. More theoretical discussions can be found in the literature
(Hyndman et al., 2008).
Data generation
To test the rst hypothesis, the following data generation procedure was applied: (1) a value from 100 to 1000 was
randomly chosen using a random number generator for each month and (2) by adding a random error between 5 and 5
for each month, monthly data with seasonal patterns was generated for 16.5 years as shown in Table 8 in Appendix A.
This is data 1, which does not contain a trend and segments.
For the second hypothesis, one more procedure was added from the procedure for data 1. After applying the rst and second steps, 50 (i.e., a trend) was added for the next year values as shown in Table 9 in Appendix A (e.g., the column of January
increases by 50). This is data 2, which contains a trend.
For the third hypothesis, after applying the rst and second steps from the procedure for data 1, 100 (i.e., a trend) was
added to the next year values. Then, eight consecutive monthly values starting from a random number were set to small
numbers r (i.e., segments) throughout the years as shown in Table 10 in Appendix A (r 0 in this test), which represents
periods of no activity. This is data 3, which contains a trend and segments.
For each data, a total of 20 data sets were generated and tested. The rst 7 years of data were used as training data and the
remaining 9.5 years of data were used as test data as shown in Fig. 1.
Test results
The goal of this test is to construct a predictive model with the training data sets and predict future values (i.e., bm in Eqs.
(40) and (41)). The test data sets work as real values (i.e., dm in Eqs. (40) and (41)). Table 1 shows the results of the data sets
(data 1, 2, 3) in Section Data generation.
First, for data 1 (data without a trend and segments), since the data sets are designed to be constant with some mild randomness, the constant rate method showed good prediction performance for the accuracy measure. The PUMLCA algorithm

137

J. Ma, H.M. Kim / Transportation Research Part D 38 (2015) 125143

with both ETS (PUMLCA-ets) and ARIMA (PUMLCA-arima) also showed the similar level of accuracy and there is no
signicant difference between the constant rate method and PUMLCA (MannWhitney test, a 0:05, p-value = 0.95). For
the variance measure, since the constant rate method took the average rate for each month, monthly predictions of the constant rate method showed much lower accuracy than those of PUMLCA (MannWhitney test, a 0:05, p-value = 0). This
affects the prediction of the next quarter values (i.e., hypothesis 4) because lower monthly errors can give higher chances
to predict specic periods with accuracy. For the next quarter values, the PUMLCA showed higher prediction accuracy
(MannWhitney test, a 0:05, p-value = 0). Therefore, the PUMLCA algorithm can provide accurate prediction capabilities
for aggregated life cycle values (accuracy), seasonal values (variance) and values for specic periods with data 1 in comparison to the constant rate method.
Second, for data 2 (data with a trend), the constant rate method showed poor prediction performance in terms of the
accuracy measure. On the other hand, the PUMLCA algorithm with both ETS and ARIMA showed good prediction accuracy.
There is no signicant difference found between the real values and the results of PUMLCA-ets/arima (MannWhitney test,
a 0:05, p-value = 0.29/0.78). For the variance measure, monthly predictions of the constant rate method showed much
lower accuracy than those of PUMLCA (MannWhitney test, a 0:05, p-value = 0). This affects the prediction of the next
quarter values. For the next quarter values, the PUMLCA showed higher prediction accuracy (MannWhitney test,
a 0:05, p-value = 0). Therefore, the PUMLCA algorithm can provide accurate prediction capabilities for aggregated life cycle
values (accuracy), seasonal values (variance) and values for specic periods with data 2 in comparison to the constant rate
method.
Third, for data 3 (data with a trend and segments), the constant rate method and the ETS method without the automatic
segmentation algorithm (ets-no seg) showed poor prediction performance in terms of the accuracy measure. On the other
hand, the ARIMA method without the automatic segmentation algorithm (arimai-no seg) and PUMLCA-ets/arima showed
strong prediction accuracy. However, Table 2 zooms in their prediction performances using MAE, and it can be seen that
the errors from the ARIMA method without the automatic segmentation algorithm were much higher than the those from
the PUMLCA method. Due to the importance of the usage stage, the errors from the ARIMA method without the automatic
segmentation are not acceptable, and this shows that the automatic segmentation algorithm can enhance the prediction
result. Out of 20 samples, the PUMLCA-ets/arima showed the best performance. For the next quarter values, the PUMLCA
method with the automatic segmentation algorithm showed higher prediction accuracy. Therefore, the proposed segmentation algorithm can improve the predictive model of PUMLCA with data 3.
Overall, the PUMLCA method with the automatic segmentation algorithm provided better prediction performance than
the constant rate method for various data sets which are simulated from the observation of real data. This prediction
improvement of usage modeling will help to estimate the environmental impact of the product of interest more accurately.
The example of the LCA with PUMLCA will be provided in the next section. The PUMLCA method could also provide
prediction intervals while estimating a point forecast. For example, a point forecast of the next month is 1344 with the
80% prediction interval of [1330, 1359]. The prediction interval can show the uncertainty of time series usage models.
Case study: agricultural machinery
Background
In this section, the proposed algorithm, predictive usage mining for life cycle assessment (PUMLCA), is demonstrated with
a case study of agricultural machines: current and new machine. The machines have more than 15,000 parts and weigh more
than 20,000 kg. The current machine was updated to have a 10% reduction of its environmental impact based on an improved
fuel efciency. This updated machine is called the new machine. The goal is to estimate the environmental impacts of the
current and new machines in a real time horizon. Due to the data security issue, simulated data is used based on the
Table 1
Test results.
Constant rate

PUMLCA-ets

PUMLCA-arima

0.75
65.58
13.84

0.08
0.76
0.25

0.14
0.79
0.24

Data 2, average MAPE


Accuracy
Variance
Next quarter value

37.05
34.92
22.06

2.80
2.80
0.74

0.91
0.98
0.29

Data 3, average MAE


Accuracy
Variance
Next quarter value

30,736
636
1979

166
2
10

154
2
9

Data 1, average MAPE


Accuracy
Variance
Next quarter value

ets-no seg

24,462
313
1017

arima-no seg

1612
225
139

138

J. Ma, H.M. Kim / Transportation Research Part D 38 (2015) 125143

Table 2
MAEs over 20 data samples of data 3.

arima-no seg
PUMLCA-ets
PUMLCA-arima

1870
58
145

3005
1061
1044

11

12

arima-no seg
PUMLCA-ets
PUMLCA-arima

558
96
57

1478
48
24

2295
311
293

2382
292
224

13

14

15

16

17

18

19

20

1870
540
322

1464
9
147

2829
122
80

1170
66
35

2826
3
64

1971
48
47

2060
48
66

855
64
16

865
70
0

829
102
119

592
17
59

2200
237
173

10
965
101
102

156
34
66

observation of real data. Tables 3 and 4 show simulated seven-year monthly data for fuel consumption and operating hours
after preprocessing the raw sensor data.
In this case study, time series usage models from the historical sensor data will be utilized to calculate the environmental
impacts for up to 1020 years. Since the rst stage (i.e., data preprocessing in Section Data preprocessing) of PUMLCA is
straightforward and simple, it was skipped in this section.

Seasonal period analysis


Instead of exploring all possible data representations (e.g., daily, weekly, quarterly, etc.), the focus was set on whether the
simulated data showed a monthly seasonality. The periodogram was plotted using Eq. (3) with the condition of frequency
greater than zero. The Periodogram shows that the maximum periodogram value can be achieved at the frequency of
0.0833 (i.e., period = 1/0.0833 = 12) for the fuel consumption data. Similarly, the operating hours data also indicate a period
of 12.

Segmentation analysis
The automatic segmentation algorithm (Algorithm 1) was applied to the two data sets in Tables 3 and 4. As a penalty, the
type I error of 0.05 was used for both data sets. First, for the fuel consumption data, a segment from February to August was
identied as a shared segment since the same change points were detected (1, 8, 9, 10, 11, and 12 as seasonal time indexes)
every year. Therefore, two segments were nally obtained, e.g., the shared segment (FebruaryAugust) and the remaining
segment (January, SeptemberDecember). Second, for the operating hours data, the segment from January to August was

Table 3
Monthly representation of fuel consumption () data.
Year

January

February

March

April

May

June

July

August

September

October

November

December

2007
2008
2009
2010
2011
2012
2013

9
15
17
16
14
16
17

0
0
0
0
0
0
0

0
0
0
0
0
0
0

0
0
0
0
0
0
0

0
0
0
0
0
0
0

0
0
0
0
0
0
0

0
0
0
0
0
0
0

2
0
0
1
1.5
0
2

600
650
660
665
660
680
700

3400
3410
3450
3370
3430
3500
3570

5000
5500
5550
5600
5650
5735
5800

250
270
280
270
275
280
285

Table 4
Monthly representation of operating hours (h) data.
Year

January

February

March

April

May

June

July

August

September

October

November

December

2007
2008
2009
2010
2011
2012
2013

1
1.8
2
1.9
1.7
1.9
2

0
0
0
0
0
0
0

0
0
0
0
0
0
0

0
0
0
0
0
0
0

0
0
0
0
0
0
0

0
0
0
0
0
0
0

0
0
0
0
0
0
0

0.2
0
0
0.1
0.2
0
0.22

35.2
37.1
38
38.3
38
39
41

100.6
101.6
105.3
97.6
103.5
110.3
115.2

152.3
158.1
159.3
160.1
162.2
164.3
165.2

15.1
16.3
17.8
16.5
17
17.9
18.2

139

J. Ma, H.M. Kim / Transportation Research Part D 38 (2015) 125143

identied as a shared segment. The same change points were detected (8, 9, 10, 11, and 12 as seasonal time indexes) every
year. Therefore, two segments were nally obtained.
Time series analysis
The automatic forecasting algorithm in Section Automatic modeling of ETS and ARIMA was applied to the original data
sets (i.e., without segmentation) and the results of the automatic segmentation in Section Segmentation analysis. Table 5
shows the results. For example, the original fuel consumption data is tted as a seasonal AR model with a seasonal differencing and a drift using ARIMA. The rst segment data (segment 1) shows a combination of seasonal AR and MA models
without a drift. The second segment data (segment 2) shows only a seasonal differencing operation with a drift. The original
fuel consumption data is also tted as an additive error and seasonal component model using ETS. The rst segment data
shows an additive error and seasonal component model again. The second segment data shows an additive trend, multiplicative error and seasonal component model.
Predictive LCA
LCA for current machine
The PUMLCA-ets models (with two segments) of fuel consumption, N ft , and operating hours, OHt , in Table 5 were used as
usage models of the agricultural machine. For predictive LCA, starting from January 2014, forecasts were built up to
December 2024 (i.e., 10 years) and up to December 2034 (i.e., 20 years). For environmental impact calculation,
Eco-Indicator 99 method (EI-99) (Goedkoop and Spriensma, 2001) was used, which is one of widely used methods in LCA
and provides a single score (Point) from pre-dened damage categories such as human health, ecosystem quality, and
resource.
In the manufacturing stage, the environmental impact was assumed as 12,000 Pt. In the usage stage, the density of diesel
fuel was assumed as 0.85 kg/l and emission rates was given in Table 6. The idling and nonidling ratio (20%/80%) was calculated using averages of seven-year operating hours by work modes. In the maintenance stage, the assumptions on the
replacement cycle of major parts and minor parts are as follows (Kwak and Kim, 2013): tires (3000 h), transmission
(3000 h), hydraulic components (3000 h), engine (5000 h), axles (5000 h), and minor parts such as oils, greases, and lters
(specied cycle). In the end-of-life stage, the following assumptions were made: steel (90% recycle and 10% landll), iron
(90% recycle and 10% landll), and others (80% landll and 20% incineration).
Based on Eqs. (30)(33), a predictive LCA result of the current machine in the real time horizon (January 2014
December 2034) was estimated as shown in Fig. 7. The impact of the manufacturing stage was the same regardless
of time horizons since it is a one-time event. On the other hand, the impacts of the usage, maintenance, and
end-of-life stage were varied by time. Similar to previous LCA studies, the impact of the usage stage accounted for

Table 5
Results of time series analysis.
ARIMA

ETS

Fuel consumption data


Original

1  0:41B12 1  B12 yt 1:53 et

yt lt1 st12
lt lt1 0:06t
st st12 104 t

Segment 1 (FebruaryAugust)

1 0:28B7 1  B7 yt 1  0:28B4 et

yt lt1 st7
lt lt1 0:001t
st st7 2  104 t

Segment 2 (January, SeptemberDecember)

1  B5 yt 7:42 et

yt lt1 bt1 st5


lt lt1 bt1 1 0:395t
bt bt1 0:098lt1 bt1 t
st st5 1 104 t

Operating hours data


Original

1  B12 yt 1 0:21Bet

yt lt1 st12
lt lt1 0:29t
st st12 3  104 t

Segment 1 (JanuaryAugust)

1  B8 yt 1  0:67B1  0:64B8 et

yt lt1 st8
lt lt1 104 t
st st8 0:03t

Segment 2 (SeptemberDecember)

1  B4 yt 0:38 et

yt lt1 st4
lt lt1 0:12lt1 st4 t
st st4 0:88lt1 st4 t

140

J. Ma, H.M. Kim / Transportation Research Part D 38 (2015) 125143

Table 6
Assumptions on emission rates (g/h) (Kwak and Kim, 2013).
Type

Nonidling (80%)

Idling (20%)

Average

Nitrogen oxides (NOx)


Particulate matter (PM)
Carbon monoxide (CO)
Hydrocarbons (HC)
Sulfur dioxide (SO2 )
Carbon dioxide (CO2 )

372.73
1.76
23.84
5.42
0.99
150829.6

143.16
0.67
9.16
2.08
0.43
065427.83

326.82
1.54
20.9
4.75
0.89
133749.3

the majority of the environmental impact. The impact of the maintenance stage showed a big increase since major parts
(engine and axles) were replaced after 10 years. It should be noted that the two usage models (PUMLCA and constant
rate method) were used for the usage stage in order to show the impact of prediction accuracy in Section Numerical
prediction tests for PUMLCA (PUMLCA was also used for the maintenance and end-of-life stages). The data in this case
study was similar to the third hypothesis in Section Numerical prediction tests for PUMLCA (i.e., data with increasing
trend and segments) so that it can be expected that the constant rate method would underestimate the impact (about
17,000 Pt over 20 years), which is greater than the impact of the manufacturing stage. If the data is quite constant, a
similar result between PUMLCA and the constant rate method would be produced as seen in Section Numerical prediction tests for PUMLCA (i.e., data without trend and segments). Furthermore, the top of Fig. 7 shows the 80% prediction
intervals of the usage impact by PUMLCA. Unlike the constant rate method, PUMLCA can provide the uncertainty of its
predictive model.
LCA for new machine
New machines were assumed to be designed based on the current machines with the target of 10% reduction of environmental impact over 20 years. It needs to utilize the usage data of the current machines with the improvement ratio, dNf and
dOH as shown in Fig. 6. Similar to the current machine, predictive LCA was conducted starting from January 2014 up to
December 2024 (i.e., 10 years) and up to December 2034 (i.e., 20 years) with the EI-99 method.
In the manufacturing stage, the environmental impact was assumed to be increased to 14,500 Pt (20.8%) due to the additional power sources. The other assumptions of the usage, maintenance and end-of-life stage were similar to the current
machine. The unit of work was the square meter (m2 ) and the performance test was conducted to compare the new machine
and the current machine. The improvement ratio for fuel consumption dNf was 0.8 and the improvement ratio for operating
hours dOH was 0.85.
Based on Eqs. (36)(39), the predictive LCA result of the new machine in the real time horizon (January 2014December
2034) was estimated as shown in Fig. 8.
Table 7 shows the comparison of the two LCA results of the current and new machine. Although the impact from the manufacturing stage was increased (20.8%) for the new machine, the total impact was reduced mainly from the usage stage. It
should be noted that the result depends on the lifespan of machines. 8.4% of environmental impact reduction was expected

Fig. 7. Predictive LCA results for current machine.

141

J. Ma, H.M. Kim / Transportation Research Part D 38 (2015) 125143

Fig. 8. Predictive LCA results for new machine.

Table 7
Comparison of current and new machines (EI-99, Pt).

Current machine
New machine

Manufacturing

Usage

10 year

20 year

10 year

20 year

Maintenance
10 year

20 year

End-of-life
10 year

20 year

Total
10 year

20 year

12,000
14,500

12,000
14,500

41,706
33,763

84,002
67,961

9295
9400

22,890
22,900

476
480

805
820

63,477
58,143

119,697
106,181

for 10 years and 11.3% for 20 years, which satises the target of 10% reduction of environmental impact over 20 years.
Sensitivity analysis can be applied to nd the minimum values of the improvement ratio, dNf and dOH to satisfy the target.
In conclusion, the proposed algorithm, PUMLCA, captured usage patterns from large-scale sensor data with the automatic
segmentation algorithm and time series analysis, and could assess environmental impact of a complex system in a real time
horizon.

Closing remarks and future work


In this paper, the predictive usage mining for life cycle assessment (PUMLCA) algorithm is proposed to model the usage
stage for the LCA of products. By dening usage patterns as trend, seasonality, and level from a time series of usage information, predictive LCA can be conducted in a real time horizon, which can provide more accurate results of LCA.
Large-scale sensor data of product operation was analyzed to mine usage patterns and build a usage model for LCA. The
PUMLCA algorithm includes handling missing and abnormal values, seasonal period analysis, segmentation analysis, time
series analysis, and predictive LCA. In order to mine important usage patterns more effectively from a time series, the automatic segmentation algorithm is developed based on change point analysis.
The prediction performance test results with various data sets showed that the predictive model from the PUMLCA
method can provide better prediction accuracy than the constant rate method. The automatic segmentation algorithm magnied important patterns and helped to predict future values more accurately.
Two different design problems were formulated to incorporate the usage model from the PUMLCA method in predictive LCA. The case study of agricultural machinery showed how to apply the PUMLCA method for the predictive LCA of
complex systems. The environment impacts of both current machines and new machines could be estimated and
compared.
In the future, various data sets from different products can be tested with the PUMLCA algorithm. The current model,
which considers only a single type of machinery, can be extended to multiple types of machinery. In order to perform
LCA with multiple types of machinery, hierarchical time series modeling and forecasting may be helpful (Hyndman et al.,
2011).

142

J. Ma, H.M. Kim / Transportation Research Part D 38 (2015) 125143

Appendix A. Sample data sets in Section Numerical prediction tests for PUMLCA
Tables 810 show the sample of data 1, data 2, and data 3.

Table 8
Sample of data 1 for hypotheses 1 and 4.
Year

January

February

March

April

May

June

July

August

September

October

November

December

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

470
475
475
466
473
474
471
473
467
466
468
472
466
473
474
473
467

538
540
542
539
534
539
541
543
536
537
536
542
541
542
538
533
534

544
545
544
547
548
539
548
545
542
544
543
542
545
539
541
549
539

669
672
670
671
674
668
667
666
670
674
673
665
664
667
664
674
672

232
231
234
229
232
232
232
229
229
235
230
222
229
229
228
232
234

911
913
908
919
913
911
913
907
911
914
909
908
916
912
914
911
915

747
742
747
745
748
747
748
748
745
743
749
750
746
742
748
751

353
354
354
350
358
349
353
354
355
355
349
351
351
354
349
356

909
909
914
906
913
908
912
911
907
910
909
908
905
908
905
909

980
982
985
975
984
983
982
980
975
979
982
976
977
977
984
979

133
130
129
135
135
132
137
136
138
136
129
132
132
133
133
135

213
218
215
216
214
208
214
217
211
217
215
208
218
217
209
212

Table 9
Sample of data 2 for hypotheses 2 and 4.
Year

January

February

March

April

May

June

July

August

September

October

November

December

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

975
1024
1077
1123
1174
1224
1273
1326
1381
1427
1481
1526
1575
1629
1677
1723
1770

872
921
973
1022
1077
1117
1176
1220
1271
1321
1367
1419
1469
1527
1575
1623
1671

965
1010
1061
1119
1160
1210
1268
1309
1359
1419
1468
1515
1561
1618
1667
1714
1760

976
1029
1070
1129
1179
1224
1275
1325
1379
1421
1472
1526
1569
1625
1670
1722
1776

799
845
893
940
991
1040
1095
1139
1197
1248
1299
1340
1393
1447
1495
1540
1592

449
500
549
605
659
701
751
808
854
899
950
1006
1058
1099
1155
1209
1258

681
733
786
832
885
930
978
1029
1078
1128
1180
1229
1278
1337
1378
1429

169
219
271
312
365
421
462
522
567
616
671
712
769
821
872
933

399
455
502
549
600
658
698
751
805
857
900
953
1005
1056
1102
1153

728
779
828
872
928
974
1030
1073
1130
1179
1230
1278
1328
1371
1420
1469

614
669
713
765
813
870
913
963
1011
1063
1117
1162
1215
1268
1320
1372

725
772
823
871
923
975
1025
1079
1131
1181
1225
1278
1328
1380
1423
1471

Table 10
Sample of data 3 for hypotheses 3 and 4.
Year

January

February

March

April

May

June

July

August

September

October

November

December

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r

r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r

r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r

r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r

r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r

r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r

155
257
355
452
558
654
752
855
958
1053
1160
1253
1354
1450
1553
1656

129
233
333
429
525
632
734
827
928
1025
1124
1231
1328
1425
1534
1629

643
746
848
944
1038
1141
1242
1344
1445
1542
1643
1743
1839
1943
2044
2143

313
409
518
610
710
813
909
1012
1117
1214
1317
1410
1510
1616
1711
1808

r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r

r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r

J. Ma, H.M. Kim / Transportation Research Part D 38 (2015) 125143

143

References
Choi, B.C., Shin, H.S., Lee, S.Y., Hur, T., 2006. Life cycle assessment of a personal computer and its effective recycling rate (7 pp). Int. J. Life Cycle Assess. 11,
122128.
Collet, P., Lardon, L., Steyer, J.P., Hlias, A., 2014. How to take time into account in the inventory step: a selective introduction based on sensitivity analysis.
Int. J. Life Cycle Assess. 19, 320330.
Finnveden, G., Hauschild, M.Z., Ekvall, T., Guine, J., Heijungs, R., Hellweg, S., Koehler, A., Pennington, D., Suh, S., 2009. Recent developments in life cycle
assessment. J. Environ. Manage. 91, 121.
Goedkoop, M., Spriensma, S., 2001. The Eco-Indicator 99: A Damage Oriented Method for Life Cycle Impact Assessment. Annex Report. Pre Consultant, B.V.
Amersfoort, The Netherlands. <Http://www.pre-sustainability.com>.
Guine, J., 2002. Handbook on Life Cycle Assessment: Operational Guide to the ISO Standards. Eco-Efciency in Industry and Science. Springer.
Hyndman, R., Athanasopoulos, G., 2013. Forecasting: Principles and Practice. <Http://otexts.org/fpp/> (accessed January 2014).
Hyndman, R.J., Khandakar, Y., 2008. Automatic time series forecasting: the forecast package for R. J. Stat. Softw. 27, 122.
Hyndman, R., Koehler, A., Ord, J.K., Snyder, R., 2008. Forecasting with Exponential Smoothing: The State Space Approach. Springer-Verlag, Berlin, Heidelberg.
Hyndman, R.J., Ahmed, R.A., Athanasopoulos, G., Shang, H.L., 2011. Optimal combination forecasts for hierarchical time series. Comput. Stat. Data Anal. 55,
25792589.
Jackson, T., 2010. Analyzing seasonal time series with periodic low volumes. In: Proceedings of International Symposium on Forecasting, San Diego, USA.
Keogh, E., Chu, S., Hart, D., Pazzani, M., 2004. Segmenting time series: a survey and novel approach. Data Min. Time Ser. Databases 57, 121.
Killick, R., Eckley, I.A., 2011. Changepoint: An R Package for Changepoint Analysis. R Package Version 0.5.
Killick, R., Fearnhead, P., Eckley, I.A., 2012. Optimal detection of changepoints with a linear computational cost. J. Am. Stat. Assoc. 107, 15901598.
Kwak, M., 2012. Green Prot Design for Lifecycle. Ph.D. Thesis, University of Illinois at Urbana-Champaign.
Kwak, M., Kim, H., 2013. Economic and environmental impacts of product service lifetime: a life-cycle perspective. In: Meier, H. (Ed.), Product-Service
Integration for Sustainable Solutions, Lecture Notes in Production Engineering. Springer, Berlin, Heidelberg, pp. 177189.
Kwak, M., Kim, L., Sarvana, O., Kim, H.M., Finamore, P., Hazewinkel, H., 2012. Life cycle assessment of complex heavy duty equipment. In: ASME
International Symposium on Flexible Automation (ISFA2012), St. Louis, USA.
Lee, J., Cho, H.J., Choi, B., Sung, J., Lee, S., Shin, M., 2000. Life cycle assessment of tractors. Int. J. Life Cycle Assess. 5, 205208.
Levasseur, A., Lesage, P., Margni, M., Deschnes, L., Samson, R., 2010. Considering time in LCA: dynamic LCA and its application to global warming impact
assessments. Environ. Sci. Technol. 44, 31693174.
Li, T., Liu, Z.C., Zhang, H.C., Jiang, Q.H., 2013. Environmental emissions and energy consumptions assessment of a diesel engine from the life cycle
perspective. J. Cleaner Prod. 53, 712.
Ma, J., Kim, H.M., 2014. Continuous preference trend mining for optimal product design with multiple prot cycles. J. Mech. Des. 136, 061002.
Ma, J., Kwak, M., Kim, H.M., 2014. Demand trend mining for predictive life cycle design. J. Cleaner Prod. 68, 189199.
Memary, R., Giurco, D., Mudd, G., Mason, L., 2012. Life cycle assessment: a time-series analysis of copper. J. Cleaner Prod. 33, 97108.
Reap, J., Roman, F., Duncan, S., Bras, B., 2008a. A survey of unresolved problems in life cycle assessment. Part 1: Goal and scope and inventory analysis. Int. J.
Life Cycle Assess. 13, 290300.
Reap, J., Roman, F., Duncan, S., Bras, B., 2008b. A survey of unresolved problems in life cycle assessment. Part 2: Impact assessment and interpretation. Int. J.
Life Cycle Assess. 13, 374388.
Rebitzer, G., Ekvall, T., Frischknecht, R., Hunkeler, D., Norris, G., Rydberg, T., Schmidt, W.P., Suh, S., Weidema, B., Pennington, D., 2004. Life cycle assessment:
Part 1: framework, goal and scope denition, inventory analysis, and applications. Environ. Int. 30, 701720.
Shumway, R., Stoffer, D., 2011. Time Series Analysis and Its Applications: With R Examples. Springer Texts in Statistics. Springer.
Sullivan, J.L., Cobas-Flores, E., 2001. Full vehicle LCAs: a review. In: Proceedings of the 2001 Environmental Sustainability Conference and Exhibition, Graz,
Austria, pp. 99114.
Telenko, C., Seepersad, C.C., 2014. Probabilistic graphical modeling of use stage energy consumption: a lightweight vehicle example. J. Mech. Des. 136,
101403.

You might also like