Time Series
Time Series
Time Series
Time series
A time series is a collection of observations of well-defined data items obtained through repeated measurements over time.
A time series can be constructed by any data that is measured over time at evenly-spaced intervals. Historical stock prices, earnings, GDP, or other sequences of financial or economic data can be analyzed as a time series.
A linear time series is one where, for each data point Xt, that data point can be viewed as a linear combination of past or future values or differences. Nonlinear time series are generated by nonlinear dynamic equations. They
have features that cannot be modelled by linear processes: time-changing variance, asymmetric cycles, higher-moment structures, thresholds and breaks. Here are some important considerations when working with linear and
nonlinear time series data:
If a regression equation doesn’t follow the rules for a linear model, then it must be a nonlinear model. Nonlinear regression can fit an enormous variety of curves. The defining characteristic for both types of models are the
functional forms.
Moving Averages
A moving average model leverages the average of the data points that exist in a specific overlapping subsection of the series. An average is taken from the first subset of the data, and then it is moved forward to the next data
point while dropping out the initial data point. A moving average can give you information about the current trends, and reduce the amount of noise in your data. Often, it is a preprocessing step for forecasting.
The way a SMA is calculated is that it takes the subset of the data mentioned in the moving average model description, adds together the data points, and then takes the average over the subset of data. It can help identify the
direction of trends in your data, and identify levels of resistance where in business or trading data, there is a price ceiling that can’t be broken through. For instance if you’re trying to identify the point where you can’t charge
past a certain amount for a product, or in investing why a stock can’t move past a certain price point, you can identify that ceiling with a moving average.
Moving averages are used by investors and traders for analyzing short term trends in stock market data, while SMA’s are being used in healthcare to better understand current trends in surgeries, and even analyze quality
control in healthcare providers.
Level: When you read about the “level” or the “level index” of time series data, it’s referring to the mean of the series.
Noise: All time series data will have noise or randomness in the data points that aren’t correlated with any explained trends. Noise is unsystematic and is short term.
Seasonality: If there are regular and predictable fluctuations in the series that are correlated with the calendar – could be quarterly, weekly, or even days of the week, then the series includes a seasonality component. It’s
important to note that seasonality is domain specific, for example real estate sales are usually higher in the summer months versus the winter months while regular retail usually peaks during the end of the year. Also, not
all time series have a seasonal component, as mentioned for audio or video data.
Trend: When referring to the “trend” in time series data, it means that the data has a long term trajectory which can either be trending in the positive or negative direction. An example of a trend would be a long term
increase in a company’s sales data or network usage.
Cycle: Repeating periods that are not related to the calendar. This includes business cycles such as economic downturns or expansions or salmon run cycles, or even audio files which have cycles, but aren’t related to the
calendar in the weekly, monthly, or yearly sense.
We have dataset that gives information about the liquor consumption with time
Period: time period
Value: Amount of liquor consumed
Period Value
0 1992-01-01 1509
1 1992-02-01 1541
2 1992-03-01 1597
3 1992-04-01 1675
4 1992-05-01 1822
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 293 entries, 0 to 292
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Period 293 non-null datetime64[ns]
1 Value 293 non-null int64
dtypes: datetime64[ns](1), int64(1)
memory usage: 4.7 KB
Value
count 293.000000
mean 2790.494881
std 861.360248
min 1501.000000
25% 2059.000000
50% 2638.000000
75% 3438.000000
max 5834.000000
Period 0
Value 0
dtype: int64
219
Period Value
0 1992-01-01 1509.0
1 1992-02-01 1541.0
2 1992-03-01 1597.0
3 1992-04-01 1675.0
4 1992-05-01 1822.0
<AxesSubplot:xlabel='Period', ylabel='Value'>
<AxesSubplot:xlabel='Period', ylabel='Value'>
Seasonality
Seasonality is a characteristic of a time series in which the data experiences regular and predictable changes that recur every calendar year. Any predictable fluctuation or pattern that recurs or repeats over a one-year
period is said to be seasonal.
Seasonality in time-series data refers to a pattern that occurs at a regular interval. This is different from regular cyclic trends, such as the rise and fall of liquor prices, that re-occur regularly but don’t have a fixed period.
There’s a lot of insight to be gained from understanding seasonality patterns in data and you can even use it as a baseline to compare your time-series machine learning models.
Looks like there is an upward trend in data with similar seasonality. Below is a function to test stationarity of data using Dickey Fuller test and also plotting the rolling statistics. In Dickey Fuller test checking the p value if it
less than 5% then the series is considered to be stationary
check Stationarity
During the TSA model preparation workflow, we must access if the given dataset is Stationary or NOT. Using Statistical and Plots test.
Augmented Dickey-Fuller (ADF) Test or Unit Root Test: The ADF test is the most popular statistical test and with the following - assumptions
Null Hypothesis (H0): Series is non-stationary
Alternate Hypothesis (HA): Series is stationary
p-value >0.05 Fail to reject (H0)
p-value <= 0.05 Accept (H1)
Augmented Dickey-Fuller (ADF) test is a type of statistical test called a unit root test. Unit roots are a cause for non-stationarity.
Null Hypothesis (H0): Time series has a unit root. (Time series is not stationary).
Alternate Hypothesis (H1): Time series has no unit root (Time series is stationary).
If the null hypothesis can be rejected, we can conclude that the time series is stationary.
On the one hand, the null hypothesis can be rejected if the p-value is below a set significance level. The defaults significance level is 5%
p-value > significance level (default: 0.05): Fail to reject the null hypothesis (H0), the data has a unit root and is non-stationary. p-value <= significance level (default: 0.05): Reject the null hypothesis (H0), the data does
not have a unit root and is stationary. On the other hand, the null hypothesis can be rejects if the test statistic is less than the critical value.
ACFand PCAF
A partial autocorrelation is a summary of the relationship between an observation in a time series with observations at prior time steps with the relationships of intervening observations removed.
The partial autocorrelation at lag k is the correlation that results after removing the effect of any correlations due to the terms at shorter lags.
The autocorrelation function (ACF) is a statistical technique that we can use to identify how correlated the values in a time series are with each other. The ACF plots the correlation coefficient against the lag, which is measured
in terms of a number of periods or units. A lag corresponds to a certain point in time after which we observe the first value in the time series.
The correlation coefficient can range from -1 (a perfect negative relationship) to +1 (a perfect positive relationship). A coefficient of 0 means that there is no relationship between the variables. Also, most often, it is measured
either by Pearson’s correlation coefficient or by Spearman’s rank correlation coefficient.
Blue bars on an ACF plot above are the error bands, and anything within these bars is not statistically significant. It means that correlation values outside of this area are very likely a correlation and not a statistical fluke. The
confidence interval is set to 95% by default.
Notice that for a lag zero, ACF is always equal to one, which makes sense because the signal is always perfectly correlated with itself.
To summarize, autocorrelation is the correlation between a time series (signal) and a delayed version of itself, while the ACF plots the correlation coefficient against the lag, and it’s a visual representation of autocorrelation.
The ACF plot for the AR(p) time series would be strong to a lag of p and remain stagnant for subsequent lag values, trailing off at some point as the effect is weakened. The PACF, on the other hand, describes the direct
relationship between an observation and its lag. This generally leads to no correlation for lag values beyond p.
The ACF for the MA(q) process would show a strong correlation with recent values up to the lag of q, then an immediate decline to minimal or no correlation. For the PACF, the plot shows a strong relationship to the lag
and then a tailing off to no correlation from the lag onwards. Above is the ACF & PACFplot for our stationary data.
To summarize, a partial autocorrelation function captures a “direct” correlation between time series and a lagged version of itself.
Partial autocorrelation is a statistical measure that captures the correlation between two variables after controlling for the effects of other variables..
ARIMA
ARIMA stands for AutoRegressive Integrated Moving Average. It is a generalization of the simpler AutoRegressive Moving Average and adds the notion of integration.
This acronym is descriptive, capturing the key aspects of the model itself. Briefly, they are:
AR: Autoregression. A model that uses the dependent relationship between an observation and some number of lagged observations. I: Integrated. The use of differencing of raw observations (e.g. subtracting an observation
from an observation at the previous time step) in order to make the time series stationary. MA: Moving Average. A model that uses the dependency between an observation and a residual error from a moving average model
applied to lagged observations.
p: The number of lag observations included in the model, also called the lag order. d: The number of times that the raw observations are differenced, also called the degree of differencing. q: The size of the moving average
window, also called the order of moving average. A linear regression model is constructed including the specified number and type of terms, and the data is prepared by a degree of differencing in order to make it stationary, i.e.
to remove trend and seasonal structures that negatively affect the regression model.
A value of 0 can be used for a parameter, which indicates to not use that element of the model. This way, the ARIMA model can be configured to perform the function of an ARMA model, and even a simple AR, I, or MA model.
Adopting an ARIMA model for a time series assumes that the underlying process that generated the observations is an ARIMA process. This may seem obvious, but helps to motivate the need to confirm the assumptions of the
model in the raw observations and in the residual errors of forecasts from the model.
Order of Diffrencing
p is the order of the AR term, q is the order of the MA term, d is the number of differencing required to make the time series stationary
So more formerly if we are saying that ARIMA(1,1,1) which means ARIMA model of order (1, 1, 1) where AR specification is 1, Integration order or shift order is one and Moving average specification is 1
How to determin p, d, q In our case, we see the first order differencing make the ts stationary. I = 1.
AR model might be investigated first with lag length selected from the PACF or via empirical investigation. In our case, it's clearly that within 1 lags the AR is significant. Which means, we can use AR = 2
To avoid the potential for incorrectly specifying the MA order (in the case where the MA is first tried then the MA order is being set to 0), it may often make sense to extend the lag observed from the last significant term in the
PACF
=================================================================================
coef std err z P>|z| [0.025 0.975]
---------------------------------------------------------------------------------
const 8.2135 0.372 22.065 0.000 7.484 8.943
ma.L1.D.Value -1.0000 nan nan nan nan nan
Roots
=============================================================================
Real Imaginary Modulus Frequency
-----------------------------------------------------------------------------
MA.1 1.0000 +0.0000j 1.0000 0.0000
-----------------------------------------------------------------------------
C:\Users\shyam\anaconda3\lib\site-packages\statsmodels\base\model.py:547: HessianInversionWarning: Inverting hessian failed, no bse or cov_params available
warnings.warn('Inverting hessian failed, no bse or cov_params '
Akaike’s Information Criterion (AIC) helps determine the strength of the linear regression model. The AIC penalizes a model for adding parameters since adding more parameters will always increase the maximum
likelihood value.
Bayesian Information Criterion (BIC), like the AIC, also punishes a model for complexity, but it also incorporates the number of rows in the data.
Hannan-Quinn Information Criterion (HQIC), like AIC and BIC is another criterion for model selection; however, it’s not used as often in practice.
Residual plot
A residual value is a measure of how much a regression line vertically misses a data point. Regression lines are the best fit of a set of data. You can think of the lines as averages; a few data points will fit the line and
others will miss. A residual plot has the Residual Values on the vertical axis; the horizontal axis displays the independent variable.
<AxesSubplot:>
count 218.000000
mean -26.145275
std 350.263249
min -586.716837
25% -216.253124
50% -76.855526
75% 55.899427
max 1320.599770
From above plot we can make sure that the prediction on test data is pretty good
74
74
MAPE Function
statistical measure to define the accuracy of a machine learning algorithm on a particular dataset
Predicting sales
From above graph we can make sure that prediction rate is pretty accurate
B m m w AC
SAR MAX
A m w T m m m F m m m
w w mm S w m m w w m W
O m w
Why no AR MA
A M A AR MA m m
A m m m T m w m m w
A mw AR MA T m w
AR MA m m m
Why SAR MA
S A M A SAR MA S AR MA AR MA m w m
w m AR m MA m w m
A AR MA m m m AR MA T m m m m m
H w C SAR MA C SAR MA m m
T E m T m
T m AR MA m
p T
d T
q T m
S E m
P S
D S
Q S m
m T m m
F m w w
W h u d bo h AR MA nd SAR MAX
Fo AR MA Mod w go u o
RMS AR MA =
MAPE AR MA =
Fo SAR MAX Mod w go u o
RMS SAR MAX =
MAPE SAR MAX =
B on d ng bo u w nb h SAR MAX u o b h n AR MA
m on o T m S An
Tm wm m w
Sm m m TSA
T m
D m m
M m w U