It is worth remarking that we consider the global TA data series since ABF is a firm headquartered in the U.S. and spread all around the world. However, for more localized companies, it would be more accurate to consider the TA data series of that particular region.
As we can observe, an increase in TA has a negative impact on the stock value of ABF. We model in the next section the time series of TA through SLR and forecast its future values to predict the impact of an increase of TA in ABF stock values.
2.1. Temperature Anomaly Modeling with SLR
In this section, we model the time series of TA through SLR, which is also employed in
Mudelsee (
2019), where the author estimates a unique breakpoint by moving block bootstrap resampling. Segmented regression makes sense when there are meaningful breakpoints for TA, allowing us to analyze changes in trends (and slopes when it comes to segmented linear regression) as well as handle the presence of structural breaks in non-stationary data. We will estimate the breakpoints using the Haar wavelets theory. Nonparametric regression with wavelets is explained in Section 3.1 of
Abramovich et al. (
2000). The sample size must be a power of 2, although this restriction can be overcome by considering some modifications (see
Abramovich et al. (
2000) and the references therein). In our case, the largest power of two samples contained has a length of 2048, corresponding to the period February 1853–September 2023. For the sake of completeness, we provide some details on the regression via Haar wavelets (for more details, refer to
Abramovich et al. (
2000)).
We consider a nonparametric regression,
where
are independent random variables with mean zero and variance
. We aim at recovering the unknown function
g from the data
without assuming any particular a priori parametric structure for
g.
A classical approach in regression analysis involves considering the function
g expanded as a Fourier series and estimating the Fourier coefficients from the given data. The selection of the basis for the expansion is a crucial step. Ideally, the basis should be parsimonious in the sense that a wide set of potential response functions can be well approximated with only a few terms of the expansion. Wavelet series have remarkable approximation properties and allow a parsimonious expansion for a wide variety of functions. The simplest wavelet basis for
is the Haar basis given by the following:
where
is called the father wavelet or scaling function, and
is called the mother wavelet. Other wavelets in the basis are generated by dilations and translations of
, that is,
, called wavelet functions. This is the orthonormal wavelet basis for functions
. We plot in
Figure 2 the scaling function
and the mother wavelet
.
From now on, and without loss of generality, we can assume that the points are equally spaced within the unit interval where the sample size n is a power of 2, that is, for some positive integer J.
In this work, we consider the so-called linear wavelet estimators of
g (in contrast, nonlinear estimators can be used, as explained in Section 3.1.2 of
Abramovich et al. (
2000)). Suppose that
g is expanded as a wavelet series on the interval
,
with
and
, where
represents the
inner product. Then,
is called the scaling coefficient, while
denotes wavelet coefficients. Since we cannot estimate an infinite set of coefficients
, we assume that
g is well approximated by a finite set of basis functions,
for some
, called the level of approximation. The corresponding wavelet estimator
is of the following form:
where the sample estimates of the scaling coefficient and the wavelet coefficients are given by the following:
As pointed out by
Abramovich et al. (
2000), the performance of the estimator
relies on the appropriate choice of level
M. Intuitively, the optimal choice of
M is related to the regularity of the response function
g. A small value of
M is associated with an over-smoothed estimator, while
would simply reproduce the data.
We aim to estimate the breakpoints of the TA series by computing the wavelet coefficients of expression (
2). A change in the size of these coefficients measured in absolute value indicates the presence of a jump in the time series. The intuition behind this fact is that, since wavelets
are supported on the interval
, each coefficient
of expression (
2) is a weighted average of values
. These values are weighted by
when
, and they are weighted by
when
.
We illustrate this fact in
Figure 3 with the TA series of sample size
, where
and
. Observe that at level
, there is only one coefficient, while at level
, the number of coefficients is 1024. In our case, we can identify four breakpoints by observing the largest coefficients within the first four approximation levels, and they correspond to the years 1938, 1981, 2002, and 2013. Some milestones stated in
Weart (
2008) confirm the trend changes during the following periods:
In the 1930s, a global warming trend was reported, which started in the late nineteenth century.
In 1938, CO2 greenhouse global warming was underway, which revived interest in the question.
Since the mid-1970s, strong global warming was reported, with 1981 being the warmest year on record.
In 2001, warming was observed in ocean basins; the match with computer models gives a clear signal of the effect of greenhouse warming.
We carried out segmented linear regressions by considering different numbers of breakpoints and the corresponding plots are shown in
Figure 4. For comparisons, we added a plot with linear regression (LR) and quadratic regression (QR).
We measure the difference between the true and predicted values through the root mean square error (RMSE), that is,
and we report these errors in
Table 2. Results given by the Haar wavelet (HW) method are compared with the cross-entropy (CE) method (see
Priyadarshana and Sofronov 2015). The experiments corresponding to the CE method were performed with the
R package
breakpoint. The breakpoint obtained with both methods differs in the case of only two segments, while regressions with 1, 2, and 3 breakpoints give similar segmentations. In all cases, the RMSE is close when comparing HW and CE methods. As a reference, We provide the RMSE values of LR and QR (0.2218 and 0.1453, respectively). We observe that SLR values with 2, 3, and 4 breakpoints outperform QR in terms of RMSE. As pointed out in
Priyadarshana and Sofronov (
2015), the CE method for breakpoint detection is an iterative stochastic optimization method that starts with a parametrized distribution, from which a random sample is generated with respect to the number of breakpoints. The authors state that the overall processing time significantly increases with the increase in sample size. In contrast, HW is a nonparametric method and it simply relies on the computation of wavelet coefficients
given by expression (
2); therefore, it is not based on a simulation. Further, since breakpoints are associated with coefficients at different levels, the estimated breakpoints remain when increasing the number of segments (when using the CE method, the breakpoints change depending on the number of segments considered).
We compute the coefficient of determination
corresponding to the last segment for each SLR. We note that the last segment will be used in
Section 3 to forecast TA values. The results obtained are shown in
Table 3. We observe that the HW method gives, in general, better results than the CE method. The
of the last segment, when considering four breakpoints, is much smaller for CE than for the HW method (this last segment only contains 10 years of monthly data).
Finally, in
Table 4, we present the RMSE corresponding to the HW method for the last segment of SLR with four breakpoints and the RMSE of the quadratic regression on the same segment. We also present the minimum and maximum values of TA forecasted for the next 12 months (that is, from October 2023 until September 2024).
The most recent values of the TA series (from October 2023 to July 2024) are 1.37, 1.42, 1.38, 1.29, 1.41, 1.36, 1.29, 1.18, 1.23, and 1.21, which are more aligned with the forecast given by the SLR method with four breakpoints.
In this section, we show that segmented linear regression outperforms linear and quadratic regression when it comes to handling structural changes in the data. The next section is devoted to investigating whether quantile regression offers a better model fit than ordinary least squares regression when it comes to modeling TA time series.
2.2. Temperature Anomaly Modeling with Quantile Regression
Quantile regression consists of modeling the relation between two variables in the tails of the distribution. While classical linear regression minimizes the residuals in the least squares sense, quantile regression minimizes the loss function given by the following:
where
is defined as follows:
and
indicates the level of quantile. We carried out the regressions corresponding to
quantiles and plot them in
Figure 5. The slopes increase from quantile
onward. This can be clearly observed in
Figure 6, where we can also see that quantile regression estimates are not within the bounds of the linear regression estimates, suggesting a statistically significant difference between the two models.
For this reason, we perform a model comparison.
Table 5 shows the AIC values corresponding to all quantiles represented in
Figure 6. The AIC value for the ordinary least squares regression is −350.0208.
Based on AIC values, quantile regression at level 0.4 provides a better fit compared to the rest of the quantiles. However, the AIC value for linear regression is even lower, suggesting a better fit than quantile regression. Further, if we compute the mean absolute error for these two models, we have 0.1858 in the case of quantile regression while the error is 0.1815 for linear regression.
Finally, we conclude that the SLR put forward in this work provides a better model fit than linear and quantile regressions for TA data.