Gebrekiros's Journal

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

© 2020 IJRAR February 2020, Volume 7, Issue 1 www.ijrar.

org (E-ISSN 2348-1269, P- ISSN 2349-5138)

Modeling and Forecasting Temperature Using


Multiple Linear Regression Approach with Dummy
Variables: in Case of Adigrat Town, Tigray Region,
Ethiopia

Gebrekiros Alemu Tareke(MSc.)


Department of Mathematics, College of Natural and Computational Sciences, Adigrat University

Abstract

Global warming have been a serious problem which worsening the living conditions of many people in world. Since
temperature fluctuation is one of the risk factors, the researcher was intended to make his own contribution on
producing forecasting model of a specific area which gives much insight for the future. In this paper, the researcher
introduces a multiple linear regression model for forecasting monthly temperature of Adigrat town, which was
developed using monthly temperature data 1991-2016(312 months) obtained from World Bank website
(www.sdwebx.world bank.org).During formulation of the model the seasonality of the data was removed through
dummy variables. The effectiveness of the model was evaluated through standard statistical error measures such as
coefficient of determination (𝑅 2 ), mean absolute percentage error (MAPE) and root mean square error (RMSE).The
observed values and forecasted values were also interpreted graphically through excel. As it is observed from the
results, multiple linear regression model with dummy variables provides accurate forecasted values of monthly
temperature of Adigrat town.

Key words: forecasting, seasonality, trend lines, Dummy variables

1. Introduction
Now a day temperature fluctuation have been a major cause of global warming which worsening the living
conditions of many people of the world especially in the developing countries like Ethiopia. Temperature fluctuation
constitutes many risk factors of agricultural economy, energy sectors and tourism.

A recent mapping on vulnerability and poverty in Africa (Orindi et al.2006; Stige et al.2006) put Ethiopia as one of
the country’s most vulnerable to climate change with the least capacity to respond. Ethiopia has suffered from
periodical extreme climate events, manifested in the form of frequent drought
(1965,1974,1983,1987,1990,1991,1999,2000,2002,2011) and occasional flooding (1997 and 2006).

Because of the indeterminate nature and complementary behavior of the temperature fluctuation, many scholars are
focusing and trying to produce an accurate time series forecasting models. A forecasting is simply a prediction of

IJRAR2001484 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 490
© 2020 IJRAR February 2020, Volume 7, Issue 1 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)

what will happen in the future. If the historical data are restricted to past values of the variable to be forecast, the
forecasting procedure is called time series method and the historical data are referred to as time series. By treating
time as the independent variable and time series variable as a dependent variable, regression analysis can be used as a
time series method.

Afsar et al.(2013) examine temperature and precipitation instability in Gilgit Bultistan region. They used regression
and stochastic models to demonstrate the anticipation of temperature and rain fall. They observed that the
precipitation prolonged with raising temperature. During 2007 to 2011 a reduction in the amount of precipitation is
observed with rise in the monthly average maximum temperature. They found auto regressive model of order one
(AR (1)) to be most suitable for forecasting temperature.

In this paper, the researcher is testing the ability and accuracy of multiple linear regression models for forecasting
the temperature of Adigrat town. Many literatures like Katerina et al.(2018) and M. Ben et al.(2014) indicates that;
fitting a data with seasonality and trend as it is using multiple linear regression may lead to wrong conclusion. For
this reason, the outer uses dummy variables to remove seasonality and trend.

2. Methodology

i. Study area
Adigrat is a city found in Tigray, Ethiopia. It is located 14.28 latitude and 39.46 longitude and it is situated at
elevation 2451 meters above sea level. Which is the 2nd biggest city in Tigray. It operates on the BEAT time zone,
which means that it follows the same time zone as Mekele.

Figure 1: map of Adigrat town

Figure source: https://www.worldatlas.com/af/et/ti/where-is-adigrat.html

IJRAR2001484 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 491
© 2020 IJRAR February 2020, Volume 7, Issue 1 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)

ii. Data Source


In this paper, a secondary data of average monthly temperature of comprising 26 years 1991-2016(312 months)
as shown in table 1 were obtained from the World Bank website (www.sdwebx.world bank.org).The data were
desseasonalised and de trended using dummy variables and then were applied to forecast the average monthly
temperature of Adigrat town.

Table 1.Monthly temperature record of Adigrat town 1991-2016


Average monthly Temperature - (Celsius)
years Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1991 16.1 17.5 17.6 19.7 20 20.2 18.1 17.6 17.7 16.8 16.1 14.5
1992 14.1 13.7 17.4 19.1 20.1 19.8 18.2 17.4 17.1 16.9 15.7 14.9
1993 15.2 15.5 18.3 18.3 19.8 20.3 18 18 17.8 16.6 17.1 14.1
1994 16 16.7 16.2 20 20.2 19.7 18 17.8 16.8 17 16 15.4
1995 15.8 15.7 17.6 18.7 19.8 19 17.7 17.5 17.5 17.3 15.8 15.5
1996 15.2 17.6 18.3 19.4 19.2 19.1 18.3 17.6 18 17.2 15.4 15.8
1997 16.1 16 18.2 18.9 19.5 20 18.4 17.7 18.3 17.2 16.5 16
1998 15.9 17 18.2 20.4 20.8 21.4 18.4 17.5 17.3 17 16.4 16
1999 16 18.4 17.6 19.6 20.2 20 17.5 17 17.3 15.9 15.9 16
2000 15.8 17.2 17.6 19.8 20.2 20.5 18.7 17.9 17.6 16.5 16.5 15.3
2001 15.6 16.6 18.5 20.7 20.1 19.6 18 17.3 17.8 17.2 16.2 16.4
2002 15.1 17.9 18.7 20.4 21 20.7 19.4 18.7 17.9 17.7 17.3 15.6
2003 16.6 18.1 19.2 20.1 21.1 20 17.7 17.7 17.7 17.2 16.6 15.5
2004 16.1 16.8 18.7 19.5 20.7 19.4 18.4 18.1 17.9 17.1 16.9 15.6
2005 15.6 19.1 19.1 20.3 20.1 19.9 17.9 18 17.9 17.3 16.5 16.1
2006 17.1 18.2 17.7 19.3 20 20.1 19.2 17.4 17.4 17.4 16.1 14.2
2007 14.9 17.1 18.7 19.5 20.6 19.7 17.2 17.4 17.7 17 16.8 15.8
2008 16.2 16.6 18.8 19.4 19.9 19.7 18.3 17.8 18.1 17.1 15.9 16.3
2009 16.5 18.4 18.6 21 21.1 21.4 18.6 18.5 19.1 17.8 17 15.9
2010 17.2 18.2 19 21.3 21.2 20.8 18.2 17.5 18 17.9 17.6 16.4
2011 15.6 18.2 18 20.1 20.3 21.2 19.6 18.2 18.2 16.9 17.1 15.8
2012 16 18.8 18.7 20 21.2 20.3 18.6 17.9 18.2 18 16.8 16
2013 17 18.4 19.5 20 21.1 20.5 19.3 17.6 18.4 17.6 17 14.8
2014 16.1 16.9 19.4 19.7 19.8 20.8 18.7 17.6 17.7 17.2 16.8 16.2
2015 15.8 18.4 20.5 19.7 20.7 21 20.1 18.6 18.9 18.4 17.3 15.4
2016 15.9 18 21.3 20.5 20.1 20.5 18.7 18.3 18.4 17.8 16.6 16.6

Source: World Bank website (www.sdwebx.world bank.org)

iii. Model Formulation


During formulation of model it is advisable to plot the time series data as shown in figure 2.

Time series plot of monthly tempreture


25

20

15

10 Temperature (Celsius)

0
13
25
37
49
61
73
85
97
1

313
109
121
133
145
157
169
181
193
205
217
229
241
253
265
277
289
301
325

Figure 2: Time series plot of monthly temperature in Adigrat 1991-2016

IJRAR2001484 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 492
© 2020 IJRAR February 2020, Volume 7, Issue 1 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)

When we observe the time series plot of the data which is obtained from the source, both seasonal pattern and upper
linear trend exists. forecasting with seasonality and a trend is obviously more difficult than forecasting for a trend or
for seasonality by itself, because compensating for both of them is more difficult than either once alone.

It is common for time series data to be treating a linear trend can be modeled by simply using time t as predictor;

𝑦(𝑡) = 𝑎0 + 𝑎1 𝑡 + 𝑒𝑡 Where 𝑒𝑡 is error term and time t=1, 2, 3,…,T (1)

Unfortunately, as we will see, we cannot throw all the data into linear regression and see what comes out. Linear
regression find a line of best fit based on minimizing the sum of squared errors. With seasonal data some point will
be far away from the trend line, which can be skew. The trend line may give a low coefficient of determination (𝑅 2 )
value which will result in us not having a lot of confidence in the linear regression values that we would get.

Dummy variables are independent variables which takes numerical values either 0 or 1 used in regression analysis
to represent subgroups of sample in your study. Dummy variables are useful which enable us to use a single
regression equation to represent multiple groups. When we model a seasonal pattern by treating seasons as a
categorical variables, if the categorical variables has 𝑘 labels, 𝑘 − 1 dummy variables are required[best].for this
reason, to model the seasonal effect of the monthly temperature, there must be 12 − 1 = 11 dummy variables.The
dummy variables can be coded as
1 𝑖𝑓 𝑚𝑜𝑛𝑡ℎ 𝑖
𝑀𝑖 = { where i=1, 2, 3,…, 12 represents for January, February, March,…,December respectively.
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Using 𝑦𝑠 to denote forecasted values of monthly data with only seasonal effect, the general form of the estimated
multiple linear regression equation is

𝑦𝑠 = 𝑎0 + 𝑎1 𝑀1 + 𝑎2 𝑀2 + 𝑎3 𝑀3 + ⋯ + 𝑎11 𝑀11 (2)

However, the data which is obtained from the source has both seasonal effect and trend. The seasonality and
trendiness of the data is tried to handle by combining the dummy variables approach for seasonality with time series
regression approach for linear trend. Hence, the general form of the estimated multiple linear regression equation for
modeling monthly seasonal effect and linear trend of the time series data is

𝑦(𝑡) = 𝑏0 + 𝑏1 𝑀1 + 𝑏2 𝑀2 + 𝑏3 𝑀3 + ⋯ + 𝑏11 𝑀11 + 𝑏12 𝑡 (3) Where, 𝑏0 is


intercept, 𝑏1 , 𝑏, … , 𝑏12 are coefficients, and 𝑀1 , 𝑀2 , 𝑀3 , … , 𝑀11 are dummy variables.

iv. Model evaluation


The accuracy of any forecasting model can be determined by choosing appropriate error measures.in this paper, the
effectiveness of the model developed is measured by standard statistical error measures such as coefficient of
determination (𝑅 2 ), mean of absolute percentage error (MAPE) and root mean square error (RMSE).

IJRAR2001484 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 493
© 2020 IJRAR February 2020, Volume 7, Issue 1 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)

∑(𝑦
̂−𝑦 2
𝑡 ̅)
The mathematical expressions of these measures are defined as follows; 𝑅2= ∑(𝑦𝑡 −𝑦̅)2

, 0 ≤ 𝑅2 ≤ 1 (4)

Where, 𝑦𝑡 , 𝑦̂𝑡 and 𝑦̅ denotes for observed values, forecasted values and mean of observed values. But, adding any
variable tends to increase the value of 𝑅 2 even if the variable is irrelevant. For this reason, 𝑅 2 is not good
measure of estimation. An alternative which is designed to overcome these problems is the adjusted coefficient of
̅̅̅̅2 ;
determination𝑅

̅̅ 𝑇−1
𝑅̅̅2 = 1 − (1 − 𝑅 2 ) 𝑇−𝐾−1 , 0 ≤ ̅𝑅̅̅2̅ ≤ 1 , (5)

Where T is the number of observations and k is the number of predictors. Using this measure, the best model will be
the one with the largest value of ̅𝑅̅̅2̅ .

The root mean square error of the observed and forecasted values is defined as

(𝑦𝑡 −𝑦
̂) 2
𝑅𝑀𝑆𝐸 = √∑𝑛1 𝑡
Where n is the number of observations (6)
n

The mean absolute percentage error can be defined as

1 |𝑦𝑡 −𝑦
̂|
𝑡
𝑀𝐴𝑃𝐸 = 𝑛 × 100 (7)
𝑦𝑡

3. Result and discussion


From equation (3) we have 𝑦(𝑡) = 𝑏0 + 𝑏1 𝑀1 + 𝑏2 𝑀2 + 𝑏3 𝑀3 + ⋯ + 𝑏11 𝑀11 +
𝑏12 𝑡 where, 𝑏0 is intercept, 𝑏1 , 𝑏, … , 𝑏12 are coefficients,
and 𝑀1 , 𝑀2 , 𝑀3 , … , 𝑀11 are dummy variables. Since equation (3) is a general multiple linear regression equation of
data with seasonality and trend, the intercept and coefficients are determined using the data in table 1 by excel as

IJRAR2001484 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 494
© 2020 IJRAR February 2020, Volume 7, Issue 1 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)

follows.
Figure 3: Screenshot of Time series temperature data with dummy variables and period Note that: Rows
31-313 are hidden
Using the data in Figure 3 and Excel’s regression tool, we obtained the computer output as shown in table 2

Table 2.Regression tool output for time series temperature data


SUMMARY OUTPUT

Regression Statistics
Multiple R 0.93320591
R Square 0.87087328
Adjusted R Square 0.86569093
Standard Error 0.61127101
Observations 312

ANOVA
df SS MS F Significance F
Regression 12 753.4902186 62.79085155 168.04623 4.3242E-125
Residual 299 111.7220207 0.373652243
Total 311 865.2122393

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 14.9716156 0.13509802 110.8203927 7.56E-245 14.7057522 15.237479 14.705752 15.23747901
M1 0.32858933 0.169588827 1.937564716 0.0536178 -0.005149557 0.66232821 -0.00515 0.662328213
M2 1.76689932 0.169579673 10.41928724 6.856E-22 1.433178452 2.10062019 1.4331785 2.100620191
M3 2.93213274 0.16957139 17.29143545 6.449E-47 2.598428173 3.26583731 2.5984282 3.265837312
M4 4.23582728 0.169563978 24.98070237 3.637E-75 3.902137294 4.56951726 3.9021373 4.569517262
M5 4.74721447 0.169557439 27.99767743 1.503E-85 4.413537358 5.08089159 4.4135374 5.080891586
M6 4.62013975 0.169551771 27.2491389 5.11E-83 4.286473788 4.95380571 4.2864738 4.953805708
M7 2.83152675 0.169546974 16.70054427 1.085E-44 2.497870231 3.16518327 2.4978702 3.165183273
M8 2.1890675 0.16954305 12.91157318 1.3E-30 1.855418701 2.5227163 1.8554187 2.522716298
M9 2.26583919 0.169539998 13.36462911 2.923E-32 1.9321964 2.59948198 1.9321964 2.599481984
M10 1.61953388 0.169537818 9.552640839 4.8E-19 1.28589538 1.95317238 1.2858954 1.953172383
M11 0.91938239 0.16953651 5.422916851 1.211E-07 0.585746468 1.25301832 0.5857465 1.253018322
Period 0.00399763 0.000384516 10.396524 8.171E-22 0.003240926 0.00475433 0.0032409 0.004754326

IJRAR2001484 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 495
© 2020 IJRAR February 2020, Volume 7, Issue 1 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)

After rounding the intercept and coefficients in table 2, the estimated multiple linear regressions is

𝑦(𝑡) = 14.97162 + 0.328589𝑀1 + 1.766899𝑀2 + 2.932133𝑀3 + 4.235827𝑀4 + 4.747214𝑀5 +


4.62014𝑀6 + 2.831527𝑀7 + 2.189067𝑀8 + 2.265839𝑀9 + 1.619534𝑀10 + 0.919382𝑀11 +
0.003998𝑡 (8)

Thus, Using equation (8),the monthly temperature of Adigrat town at any time period t can be forecasted as follows;

For time period t=1,

𝑦(𝑡) = 14.97162 + 0.328589𝑀1 + 1.766899𝑀2 + 2.932133𝑀3 + 4.235827𝑀4 + 4.747214𝑀5 +


4.62014𝑀6 + 2.831527𝑀7 + 2.189067𝑀8 + 2.265839𝑀9 + 1.619534𝑀10 + 0.919382𝑀11 +
0.003998𝑡

Since the dummy variables; 𝑀1 = 1 and 𝑀2 , 𝑀3 , 𝑀4 ,…, 𝑀11 are 0 ,the forecasted value is 𝑦 (1) =15.30421

For time period t=2,

𝑦(𝑡) = 14.97162 + 0.328589𝑀1 + 1.766899𝑀2 + 2.932133𝑀3 + 4.235827𝑀4 + 4.747214𝑀5 +


4.62014𝑀6 + 2.831527𝑀7 + 2.189067𝑀8 + 2.265839𝑀9 + 1.619534𝑀10 + 0.919382𝑀11 +
0.003998𝑡

Since the dummy variables: 𝑀1 = 1 , 𝑀2 = 0 and 𝑀3 , 𝑀4 , 𝑀5 …, 𝑀11 are 0 ,the forecasted value is 𝑦 (2) =
16.746515

In similar manner, the forecasted values of the monthly temperature1991-2026(432 months) are computed using
excel and highlighted in figure 3 which is the screenshot of the computer output.

IJRAR2001484 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 496
© 2020 IJRAR February 2020, Volume 7, Issue 1 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)

Figure 4: Time series forecast of monthly temperature from 1991-2026 Note that: Rows 31-433 are hidden

25

20

15

observed Temp.(0C)
10 forecasted Values

0
145

337
109
121
133
157
169
181
193
205
217
229
241
253
265
277
289
301
313
325
349
361
373
385
397
409
421
433
1
13
25
37
49
61
73
85
97

Finally in order to visualize the accuracy of the forecasting model, the observed values and forecasted values of the
time series data are plotted as shown in figure 5

Figure 5: Time series plot of observed and forecasted values of temperature

From the chart we may say that, the forecasting model is really good, the lines of observed values and forecasting
values are following the same pattern and somewhere overlapped. Finally, the researcher cheeks the accuracy of the
developed model using the standard error measures and summarized in the table 3 below.
IJRAR2001484 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 497
© 2020 IJRAR February 2020, Volume 7, Issue 1 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)

Table 3. Standard error measure values


Error measures ̅𝑅̅̅2̅ MAPE RMSE
Values 0.86569093 2.57244% 0.5984

Observing table 3, there is large value of 𝑅 2 which is closest to 1, and small values of MAPE and RMSE. This
finding confirms, multiple linear regression model with dummy variables for the data which is obtained from the
source provides accurate forecasted values of monthly temperature of Adigrat town.

Conclusion
In this paper, a multiple linear regression model which uses to forecast the monthly temperature of Adigrat town was
developed using a monthly temperature data obtained from World Bank website (www.sdwebx.world bank.org).the
accuracy of the model was measured through̅̅̅̅
𝑅 2 , MAPE and RMSE; and found the results 0.86569093, 2.57244%
and 0.5984 respectively. From the graphical interpretations and calculated error measures, we observed that, the
model is worked well for forecasting the monthly temperature of the specified site. Since the methodology used in
this paper can be applied easily to other parts of the world, the model can provide much insight for metrological
sectors on the dynamics of temperature fluctuation and producing mechanisms to control global warming.

References
[1] Afsar, S., Abbas, N., & Jan, B. (2013). Comparative study of temperature and rainfall fluctuation in Hunza-
nagar District. Journal of Basic and applied sciences, 9, 151-156
[2] Stige LC, Stave J, Chan K (2006). The effect of climate variation on agro-pastoral production in Africa. ProcNatl
Acad Sci 103:3049–3053
[3]Hyndman,R.J.,and Athanasopouls, G.(2018). Forecasting: principles and practice, 2nd edition, text: Melbourne,
Australia.
[4] Garavaglia, Susan; Sharma, Asha.(2003). "A Smart Guide to Dummy Variables: Four Applications and a
Macro" . Archived from the original (PDF)
[5] Emmanuel Ekpenyong(2019). A Comparison of the Forecasting Models of Rainfall Data of Umudike, Abia State
Nigeria. International Journal of Basic Science and Technology,5, 49-57
[6] V. Prema and K. Uma Rao(2015)Time series decomposition model for accurate wind speed forecast.Springer,
DOI: 10.1186/s40807-015-0018-9
[7] Madan Kumar Jha(2013).Groundwater-level prediction using multiple linear regression and artificial neural
network techniques: A comparative assessment. Article in Hydrogeology Journal, DOI: 10.1007/s10040-013-1029-5
[8] Katerina T. , Antonios M. and Stelios K.(2018). IDArtificial Neural Network and Multiple Linear Regression for
Flood Prediction in Mohawk River,New York. www.mdpi.com/journal/water
[9] M. Ben,O. Zegaoui, and A.Abdallaoui(2014). Development of Mathematical Models To Forecasting The Monthly
Precipitation. American Journal of Engineering Research (AJER) 3, pp-38-45

IJRAR2001484 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 498

You might also like