Time Series

Build a machine learning model that can forecast liquior sales
Time series
A time series is a collection of observations of well-defined data items obtained through repeated measurements over time.
Examples of Time Series
A time series can be constructed by any data that is measured over time at evenly-spaced intervals. Historical stock prices, earnings, GDP, or other sequences of financial or economic data can be analyzed as a time series.
Linear vs. nonlinear time series data
A linear time series is one where, for each data point Xt, that data point can be viewed as a linear combination of past or future values or differences. Nonlinear time series are generated by nonlinear dynamic equations. They
have features that cannot be modelled by linear processes: time-changing variance, asymmetric cycles, higher-moment structures, thresholds and breaks. Here are some important considerations when working with linear and
nonlinear time series data:
If a regression equation doesn’t follow the rules for a linear model, then it must be a nonlinear model. Nonlinear regression can fit an enormous variety of curves. The defining characteristic for both types of models are the
functional forms.
Why Time Series

Helps organizations/business understand the underlying causes of trends or systemic patterns over time.
Using data visualizations, business users can see seasonal trends and dig deeper into why these trends occur. With modern analytics platforms, these visualizations can go far beyond line graphs. When organizations
analyze data over consistent intervals, they can also use time series forecasting to predict the likelihood of future events.
Time series forecasting is part of predictive analytics. It can show likely changes in the data, like seasonality or cyclic behavior, which provides a better understanding of data variables and helps forecast better.
Real time Examples of Time series
Forecasting the closing price of a stock each day.

Forecasting product sales in units sold each day for a store.
Forecasting Subscribers rate
Forecasting the average price of gasoline each day.
Understanding Average in time series
Moving Averages
A moving average model leverages the average of the data points that exist in a specific overlapping subsection of the series. An average is taken from the first subset of the data, and then it is moved forward to the next data
point while dropping out the initial data point. A moving average can give you information about the current trends, and reduce the amount of noise in your data. Often, it is a preprocessing step for forecasting.
Simple Moving Average
The way a SMA is calculated is that it takes the subset of the data mentioned in the moving average model description, adds together the data points, and then takes the average over the subset of data. It can help identify the
direction of trends in your data, and identify levels of resistance where in business or trading data, there is a price ceiling that can’t be broken through. For instance if you’re trying to identify the point where you can’t charge
past a certain amount for a product, or in investing why a stock can’t move past a certain price point, you can identify that ceiling with a moving average.
Moving averages are used by investors and traders for analyzing short term trends in stock market data, while SMA’s are being used in healthcare to better understand current trends in surgeries, and even analyze quality
control in healthcare providers.
components that can occur in time series data
Level: When you read about the “level” or the “level index” of time series data, it’s referring to the mean of the series.
Noise: All time series data will have noise or randomness in the data points that aren’t correlated with any explained trends. Noise is unsystematic and is short term.
Seasonality: If there are regular and predictable fluctuations in the series that are correlated with the calendar – could be quarterly, weekly, or even days of the week, then the series includes a seasonality component. It’s
important to note that seasonality is domain specific, for example real estate sales are usually higher in the summer months versus the winter months while regular retail usually peaks during the end of the year. Also, not
all time series have a seasonal component, as mentioned for audio or video data.
Trend: When referring to the “trend” in time series data, it means that the data has a long term trajectory which can either be trending in the positive or negative direction. An example of a trend would be a long term
increase in a company’s sales data or network usage.
Cycle: Repeating periods that are not related to the calendar. This includes business cycles such as economic downturns or expansions or salmon run cycles, or even audio files which have cycles, but aren’t related to the
calendar in the weekly, monthly, or yearly sense.
About the dataset
We have dataset that gives information about the liquor consumption with time
Period: time period
Value: Amount of liquor consumed
For Data set please click here Click here
Period Value
0 1992-01-01 1509
1 1992-02-01 1541
2 1992-03-01 1597
3 1992-04-01 1675
4 1992-05-01 1822
... ... ...
288 2016-01-01 3559
289 2016-02-01 3718
290 2016-03-01 3986
291 2016-04-01 4043
292 2016-05-01 4311
293 rows × 2 columns
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 293 entries, 0 to 292
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Period 293 non-null datetime64[ns]
1 Value 293 non-null int64
dtypes: datetime64[ns](1), int64(1)
memory usage: 4.7 KB
Value
count 293.000000
mean 2790.494881
std 861.360248
min 1501.000000
25% 2059.000000
50% 2638.000000
75% 3438.000000
max 5834.000000
Period 0
Value 0
dtype: int64
There are no null values

no need to handle missing values
We have 293 rows in our data

lets use 75% of data for training
lets see the amount of data we will having in our training data
219
Length of test dataset 74

Period Value
219 2010-04-01 3310
220 2010-05-01 3466
221 2010-06-01 3438
222 2010-07-01 3657
223 2010-08-01 3455
Period Value
0 1992-01-01 1509.0
1 1992-02-01 1541.0
2 1992-03-01 1597.0
3 1992-04-01 1675.0
4 1992-05-01 1822.0
<AxesSubplot:xlabel='Period', ylabel='Value'>
<AxesSubplot:xlabel='Period', ylabel='Value'>
Seasonality
Seasonality is a characteristic of a time series in which the data experiences regular and predictable changes that recur every calendar year. Any predictable fluctuation or pattern that recurs or repeats over a one-year
period is said to be seasonal.
Seasonality in time-series data refers to a pattern that occurs at a regular interval. This is different from regular cyclic trends, such as the rise and fall of liquor prices, that re-occur regularly but don’t have a fixed period.
There’s a lot of insight to be gained from understanding seasonality patterns in data and you can even use it as a baseline to compare your time-series machine learning models.
Checking the stationarity
it is checking the statistical properties of the time series data

A linear trend is a straight line.
A linear seasonality has the same frequency (width of cycles) and amplitude (height of cycles).
<Figure size 432x288 with 0 Axes>
Looks like there is an upward trend in data with similar seasonality. Below is a function to test stationarity of data using Dickey Fuller test and also plotting the rolling statistics. In Dickey Fuller test checking the p value if it
less than 5% then the series is considered to be stationary
check Stationarity
During the TSA model preparation workflow, we must access if the given dataset is Stationary or NOT. Using Statistical and Plots test.
a. Augmented Dickey-Fuller (ADF) Test
Augmented Dickey-Fuller (ADF) Test or Unit Root Test: The ADF test is the most popular statistical test and with the following - assumptions
Null Hypothesis (H0): Series is non-stationary
Alternate Hypothesis (HA): Series is stationary
p-value >0.05 Fail to reject (H0)
p-value <= 0.05 Accept (H1)
Results of Dickey-Fuller Test:

p-value = 0.9945. The series is likely non-stationary.
Test Statistic 1.019454
p-value 0.994467
#Lags Used 20.000000
Number of Observations Used 198.000000
Critical Value (1%) -3.463815
dtype: float64
Augmented Dickey-Fuller (ADF) test is a type of statistical test called a unit root test. Unit roots are a cause for non-stationarity.
Null Hypothesis (H0): Time series has a unit root. (Time series is not stationary).
Alternate Hypothesis (H1): Time series has no unit root (Time series is stationary).
If the null hypothesis can be rejected, we can conclude that the time series is stationary.
There are two ways to rejects the null hypothesis:
On the one hand, the null hypothesis can be rejected if the p-value is below a set significance level. The defaults significance level is 5%
p-value > significance level (default: 0.05): Fail to reject the null hypothesis (H0), the data has a unit root and is non-stationary. p-value <= significance level (default: 0.05): Reject the null hypothesis (H0), the data does
not have a unit root and is stationary. On the other hand, the null hypothesis can be rejects if the test statistic is less than the critical value.
Results of Dickey-Fuller Test:

p-value = 0.0020. The series is likely stationary.
Test Statistic -3.906522
p-value 0.001987
#Lags Used 20.000000
Number of Observations Used 197.000000
dtype: float64
ACFand PCAF
A partial autocorrelation is a summary of the relationship between an observation in a time series with observations at prior time steps with the relationships of intervening observations removed.
The partial autocorrelation at lag k is the correlation that results after removing the effect of any correlations due to the terms at shorter lags.
The autocorrelation function (ACF) is a statistical technique that we can use to identify how correlated the values in a time series are with each other. The ACF plots the correlation coefficient against the lag, which is measured
in terms of a number of periods or units. A lag corresponds to a certain point in time after which we observe the first value in the time series.
The correlation coefficient can range from -1 (a perfect negative relationship) to +1 (a perfect positive relationship). A coefficient of 0 means that there is no relationship between the variables. Also, most often, it is measured
either by Pearson’s correlation coefficient or by Spearman’s rank correlation coefficient.
Blue bars on an ACF plot above are the error bands, and anything within these bars is not statistically significant. It means that correlation values outside of this area are very likely a correlation and not a statistical fluke. The
confidence interval is set to 95% by default.
Notice that for a lag zero, ACF is always equal to one, which makes sense because the signal is always perfectly correlated with itself.
To summarize, autocorrelation is the correlation between a time series (signal) and a delayed version of itself, while the ACF plots the correlation coefficient against the lag, and it’s a visual representation of autocorrelation.
The ACF plot for the AR(p) time series would be strong to a lag of p and remain stagnant for subsequent lag values, trailing off at some point as the effect is weakened. The PACF, on the other hand, describes the direct
relationship between an observation and its lag. This generally leads to no correlation for lag values beyond p.
The ACF for the MA(q) process would show a strong correlation with recent values up to the lag of q, then an immediate decline to minimal or no correlation. For the PACF, the plot shows a strong relationship to the lag
and then a tailing off to no correlation from the lag onwards. Above is the ACF & PACFplot for our stationary data.
To summarize, a partial autocorrelation function captures a “direct” correlation between time series and a lagged version of itself.
Partial autocorrelation is a statistical measure that captures the correlation between two variables after controlling for the effects of other variables..
ARIMA
ARIMA stands for AutoRegressive Integrated Moving Average. It is a generalization of the simpler AutoRegressive Moving Average and adds the notion of integration.
This acronym is descriptive, capturing the key aspects of the model itself. Briefly, they are:
AR: Autoregression. A model that uses the dependent relationship between an observation and some number of lagged observations. I: Integrated. The use of differencing of raw observations (e.g. subtracting an observation
from an observation at the previous time step) in order to make the time series stationary. MA: Moving Average. A model that uses the dependency between an observation and a residual error from a moving average model
applied to lagged observations.
The parameters of the ARIMA model are defined as follows:
p: The number of lag observations included in the model, also called the lag order. d: The number of times that the raw observations are differenced, also called the degree of differencing. q: The size of the moving average
window, also called the order of moving average. A linear regression model is constructed including the specified number and type of terms, and the data is prepared by a degree of differencing in order to make it stationary, i.e.
to remove trend and seasonal structures that negatively affect the regression model.
A value of 0 can be used for a parameter, which indicates to not use that element of the model. This way, the ARIMA model can be configured to perform the function of an ARMA model, and even a simple AR, I, or MA model.
Adopting an ARIMA model for a time series assumes that the underlying process that generated the observations is an ARIMA process. This may seem obvious, but helps to motivate the need to confirm the assumptions of the
model in the raw observations and in the residual errors of forecasts from the model.
Order of Diffrencing
p is the order of the AR term, q is the order of the MA term, d is the number of differencing required to make the time series stationary
So more formerly if we are saying that ARIMA(1,1,1) which means ARIMA model of order (1, 1, 1) where AR specification is 1, Integration order or shift order is one and Moving average specification is 1
How to determin p, d, q In our case, we see the first order differencing make the ts stationary. I = 1.
AR model might be investigated first with lag length selected from the PACF or via empirical investigation. In our case, it's clearly that within 1 lags the AR is significant. Which means, we can use AR = 2
To avoid the potential for incorrectly specifying the MA order (in the case where the MA is first tried then the MA order is being set to 0), it may often make sense to extend the lag observed from the last significant term in the
PACF
ARIMA Model Results

==============================================================================
Dep. Variable: D.Value No. Observations: 218
Model: ARIMA(0, 1, 1) Log Likelihood -1587.963
Method: css-mle S.D. of innovations 348.255
Date: Thu, 06 Jan 2022 AIC 3181.925
Time: 16:26:02 BIC 3192.079
Sample: 1 HQIC 3186.026
=================================================================================
coef std err z P>|z| [0.025 0.975]
---------------------------------------------------------------------------------
const 8.2135 0.372 22.065 0.000 7.484 8.943
ma.L1.D.Value -1.0000 nan nan nan nan nan
Roots
=============================================================================
Real Imaginary Modulus Frequency
-----------------------------------------------------------------------------
MA.1 1.0000 +0.0000j 1.0000 0.0000
-----------------------------------------------------------------------------
C:\Users\shyam\anaconda3\lib\site-packages\statsmodels\base\model.py:547: HessianInversionWarning: Inverting hessian failed, no bse or cov_params available
warnings.warn('Inverting hessian failed, no bse or cov_params '
Dep. Variable - What we’re trying to predict.

Model - The type of model we’re using. AR, MA, ARIMA.
Date - The date we ran the model
Time - The time the model finished
Sample - The range of the data
No. Observations - The number of observations
Akaike’s Information Criterion
Akaike’s Information Criterion (AIC) helps determine the strength of the linear regression model. The AIC penalizes a model for adding parameters since adding more parameters will always increase the maximum
likelihood value.
Bayesian Information Criterion
Bayesian Information Criterion (BIC), like the AIC, also punishes a model for complexity, but it also incorporates the number of rows in the data.
Hannan-Quinn Information Criterion
Hannan-Quinn Information Criterion (HQIC), like AIC and BIC is another criterion for model selection; however, it’s not used as often in practice.
Residual plot
A residual value is a measure of how much a regression line vertically misses a data point. Regression lines are the best fit of a set of data. You can think of the lines as averages; a few data points will fit the line and
others will miss. A residual plot has the Residual Values on the vertical axis; the horizontal axis displays the independent variable.
<AxesSubplot:>
count 218.000000
mean -26.145275
std 350.263249
min -586.716837
25% -216.253124
50% -76.855526
75% 55.899427
max 1320.599770
array([3340.44770213, 3348.66122709, 3356.87475205, 3365.088277 ,

3373.30180196, 3381.51532692, 3389.72885187, 3397.94237683,
3406.15590179, 3414.36942674, 3422.5829517 , 3430.79647666,
3439.01000161, 3447.22352657, 3455.43705153, 3463.65057648,
3471.86410144, 3480.0776264 , 3488.29115135, 3496.50467631,
3504.71820127, 3512.93172622, 3521.14525118, 3529.35877614,
3537.57230109, 3545.78582605, 3553.99935101, 3562.21287596,
3570.42640092, 3578.63992588, 3586.85345083, 3595.06697579,
3603.28050075, 3611.4940257 , 3619.70755066, 3627.92107562,
3636.13460057, 3644.34812553, 3652.56165049, 3660.77517544,
3668.9887004 , 3677.20222536, 3685.41575031, 3693.62927527,
3701.84280023, 3710.05632518, 3718.26985014, 3726.4833751 ,
3734.69690005, 3742.91042501, 3751.12394997, 3759.33747492,
3767.55099988, 3775.76452484, 3783.97804979, 3792.19157475,
3800.40509971, 3808.61862466, 3816.83214962, 3825.04567458,
3833.25919953, 3841.47272449, 3849.68624945, 3857.8997744 ,
3866.11329936, 3874.32682432, 3882.54034927, 3890.75387423,
3898.96739919, 3907.18092414, 3915.3944491 , 3923.60797406,
3931.82149901, 3940.03502397])
From above plot we can make sure that the prediction on test data is pretty good
74
74
Mean Squared Error

mean squared error (MSE) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between the estimated values
and the actual value.
Validation RMS 527.9737439495028
MAPE Function
statistical measure to define the accuracy of a machine learning algorithm on a particular dataset
The MAPE for Validation is 10.554307272570833
[3340.44770213 3348.66122709 3356.87475205 3365.088277 3373.30180196

3381.51532692 3389.72885187 3397.94237683 3406.15590179 3414.36942674
3422.5829517 3430.79647666 3439.01000161 3447.22352657 3455.43705153
3463.65057648 3471.86410144 3480.0776264 3488.29115135 3496.50467631
3504.71820127 3512.93172622 3521.14525118 3529.35877614 3537.57230109
3545.78582605 3553.99935101 3562.21287596 3570.42640092 3578.63992588
3586.85345083 3595.06697579 3603.28050075 3611.4940257 3619.70755066
3627.92107562 3636.13460057 3644.34812553 3652.56165049 3660.77517544
3668.9887004 3677.20222536 3685.41575031 3693.62927527 3701.84280023
3710.05632518 3718.26985014 3726.4833751 3734.69690005 3742.91042501
3751.12394997 3759.33747492 3767.55099988 3775.76452484 3783.97804979
3792.19157475 3800.40509971 3808.61862466 3816.83214962 3825.04567458
3833.25919953 3841.47272449 3849.68624945 3857.8997744 3866.11329936
3874.32682432 3882.54034927 3890.75387423 3898.96739919 3907.18092414
3915.3944491 3923.60797406 3931.82149901 3940.03502397 3948.24854893
3956.46207388 3964.67559884 3972.8891238 3981.10264875 3989.31617371
3997.52969867 4005.74322362 4013.95674858 4022.17027354 4030.3837985
4038.59732345 4046.81084841 4055.02437337 4063.23789832 4071.45142328
4079.66494824 4087.87847319 4096.09199815 4104.30552311 4112.51904806
4120.73257302 4128.94609798 4137.15962293 4145.37314789 4153.58667285]
Predicting sales
From above graph we can make sure that prediction rate is pretty accurate
To get P,D, Q and Seasonal sarima parameters by iterating given inputs
Examples of parameter combinations for Seasonal ARIMA...

SARIMAX: (0, 0, 1) x (0, 0, 1, 12)
SARIMAX: (0, 0, 1) x (0, 0, 2, 12)
SARIMAX: (0, 0, 2) x (0, 1, 0, 12)
SARIMAX: (0, 0, 2) x (0, 1, 1, 12)
Current Iter - 1, ARIMA(0, 0, 0)x(0, 0, 0, 12) 12 - AIC:4035.2332296832456

C:\Users\shyam\anaconda3\lib\site-packages\statsmodels\base\model.py:566: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
warnings.warn("Maximum Likelihood optimization failed to "
C:\Users\shyam\anaconda3\lib\sit
B m m w AC
SAR MAX
SAR MAX S A R M A w X AR MA m AR MA m w SAR MAX

X w m m m T w SAR MAX m SAR MA A AR MA
A m w T m m m F m m m
w w mm S w m m w w m W
O m w
SAR MAX m m w w T m AR MAX m w

w w m
Why no AR MA
A M A AR MA m m
A m m m T m w m m w
A mw AR MA T m w
AR MA m m m
Why SAR MA
S A M A SAR MA S AR MA AR MA m w m
w m AR m MA m w m
A AR MA m m m AR MA T m m m m m
H w C SAR MA C SAR MA m m
T E m T m
T m AR MA m
p T
d T
q T m
S E m
P S
D S
Q S m
m T m m
F m w w
W h u d bo h AR MA nd SAR MAX
Fo AR MA Mod w go u o
RMS AR MA =
MAPE AR MA =
Fo SAR MAX Mod w go u o
RMS SAR MAX =
MAPE SAR MAX =
B on d ng bo u w nb h SAR MAX u o b h n AR MA
m on o T m S An
Tm wm m w
Sm m m TSA
T m
D m m
M m w U

Time Series

Uploaded by

Copyright:

Available Formats

Time Series

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Time Series

Uploaded by

Copyright:

Available Formats

Build a machine learning model that can forecast liquior sales

Examples of Time Series

Linear vs. nonlinear time series data

Why Time Series

Real time Examples of Time series

Forecasting the closing price of a stock each day.

Understanding Average in time series

Simple Moving Average

components that can occur in time series data

About the dataset

For Data set please click here Click here

... ... ...

288 2016-01-01 3559

289 2016-02-01 3718

290 2016-03-01 3986

291 2016-04-01 4043

292 2016-05-01 4311

293 rows × 2 columns

There are no null values

We have 293 rows in our data

Length of test dataset 74

219 2010-04-01 3310

220 2010-05-01 3466

221 2010-06-01 3438

222 2010-07-01 3657

223 2010-08-01 3455

Checking the stationarity

it is checking the statistical properties of the time series data

<Figure size 432x288 with 0 Axes>

a. Augmented Dickey-Fuller (ADF) Test

Results of Dickey-Fuller Test:

There are two ways to rejects the null hypothesis:

Results of Dickey-Fuller Test:

The parameters of the ARIMA model are defined as follows:

ARIMA Model Results

Dep. Variable - What we’re trying to predict.

Akaike’s Information Criterion

Bayesian Information Criterion

Hannan-Quinn Information Criterion

array([3340.44770213, 3348.66122709, 3356.87475205, 3365.088277 ,

Mean Squared Error

Validation RMS 527.9737439495028

The MAPE for Validation is 10.554307272570833

[3340.44770213 3348.66122709 3356.87475205 3365.088277 3373.30180196

To get P,D, Q and Seasonal sarima parameters by iterating given inputs

Examples of parameter combinations for Seasonal ARIMA...

Current Iter - 1, ARIMA(0, 0, 0)x(0, 0, 0, 12) 12 - AIC:4035.2332296832456

SAR MAX S A R M A w X AR MA m AR MA m w SAR MAX

SAR MAX m m w w T m AR MAX m w

You might also like