1. Introduction
Consider a component with a strength
Y and subjected to a stress
X. The component fails if the stress
X exceeds the component strength
Y; otherwise, it works properly. For independent components, the stress–strength reliability (SSR)
R, also referred to as stress–strength probability, is given by:
where
and
denote, respectively, the cumulative distribution function (CDF) of
X and the probability density function (PDF) of
Y.
Although
R was initially applied in the context of engineering, the interest in such metric spreads to several areas, such as household financial fragility [
1], stock marketmodeling [
2], asset selection [
3], among others. We refer the reader to [
4] for further details on stress–strength models.
The choice of an appropriate distribution to model both
X and
Y directly influences the calculation and estimation of R. In Finance, we have strong evidence that asset returns are better modeled by either
-stable processes (heavy-tailed alternative to Brownian motion [
5]) or by heavy-tailed time series models [
6,
7]. The Extreme-Value Theory (EVT) made available a body of knowledge around heavy-tailed distributions, like the definition of the extreme-value distributions [
8] (and its variations), which can be used as a proxy of various fat-tailed distributions. Several studies have successfully applied EVT to model financial data [
3,
9,
10,
11,
12], where it has been shown that EVT-based models may provide adequate risk management strategies. Therefore, generalizations of extreme-value distributions may enhance the quality of the models, and this is the general contribution of the present work.
In this paper, we are interested in further exploring
R calculations in the context of extreme-value distributions. The estimation of
R, when
X and
Y are independent random variables following extreme-value distributions, has been extensively studied. For example, Ref. [
13] derived the expression of
R for the extreme distributions Gumbel, Fréchet, and Weibull, Ref. [
14] considered a Bayesian analysis of the Fréchet stress–strength model, Ref. [
15] discussed Bayesian estimation of
for the Weibull distribution with arbitrary parameters and [
16] improved the estimation for
R by not using transformations in the data and eliminating the constraints on the parameters in the case of the Weibull models. Closed-form expressions for
R when
X and
Y follow generalized extreme-value (GEV) distributions were obtained in [
3], who also proposed an estimation procedure for
R by not using transformations in the data and with as few parameter restrictions as possible.
Several generalizations of the extreme-value distribution have been proposed, but in the present paper, the so-called transmuted generalized extreme-value (TGEV) distribution shall be considered. The TGEV distribution, initially proposed by [
17], has since been extensively studied and applied in various modeling scenarios. Significant contributions to its application and understanding have been made by [
18,
19]. Essentially, the TGEV distribution is a modification of the generalized extreme-value distribution (GEV), whose CDF is given by:
where
is the shape parameter,
is the location parameter and
is the scale parameter. Then, the TGEV distribution is obtained as follows: given the GEV distribution
, the transmuted distribution function
F is given by:
Properties such as moments, quartiles, tail behavior, and order statistics, among others, were studied in [
19]. They also showed its applicability in modeling log-returns of stock prices.
In [
19], the TGEV parameters
were estimated by a maximum likelihood approach. In contrast, this work proposes a two-step estimation procedure. First, a GEV model is estimated to yield
parameters. Then, a refinement step is taken by estimating the
parameter in an attempt to improve the first step fit and to reduce the overall computational effort to estimate the TGEV parameters.
In this paper, we consider the problem of estimating the stress–strength parameter R when X and Y are independent TGEV random variables. In addition, our framework does not require transformations in the data and, to the best of our knowledge, allows for the fewest parameter restrictions.
Our main contributions are
to derive R analytically in terms of special functions;
to derive closed-form expressions for multicomponent system reliability;
to propose an estimation procedure for R and validate such procedure via a simulation study and
to apply the theoretical results in asset selection problems in finance.
The paper is organized as follows: in
Section 2, we define the extreme-value
-function and the
H-function, and we explicitly present the CDF and PDF of the TGEV distribution.
Section 3 deals with the derivation of
R when
X and
Y are independent TGEV random variables. The maximum likelihood estimation for
R is presented in
Section 4. In
Section 5, we discuss a simulation study and a stock price modeling application for asset selection. The last section presents the conclusions.
3. Main Results
In this section, the reliability of two independent TGEV random variables is derived in terms of -functions. In addition, with suitable parameter restrictions, simpler expressions in terms of the H-function are also obtained. First, we consider the case of two independent TGEV with .
Theorem 1. Let X and Y be independent random variables, respectively, with distribution and , , , , , . Then
provided that . When : Proof. Set
. Then
where
and
.
Note that
where
,
. We have four cases to consider:
and
- (a)
;
- (b)
;
and
- (a)
;
- (b)
.
Let us consider case 1(a). Substituting
, it follows from (
13) that
Therefore, (
8) follows from (
4) and (
14). For case 1(b), it suffices to notice that
and apply the result in (
8) with interchanged sub-indices. For cases 2(a) and 2(b), the same rationale can be applied, just noticing that in such cases,
x mostly takes negative values. □
Remark 1. Note that if we take , X and Y are random variables with GEV distributions, then our Theorem 1 generalizes the Theorem 3.1 in [3]. Remark 2. In a practical scenario, the estimates should be obtained. Then, if , the conditions or must be verified and the corresponding R expression should be used.
Remark 3. It follows from (5) that if and , then (8) can be written in terms of H-function as:In particular, by using a special case of the H-function as seen in [21], if , and , we have: Lastly, we consider the cases of two independent TGEV distributions with .
Theorem 2. Let and be independent random variables with , , , . Then In particular, if we take
, we obtain the explicit form
Proof. Denote
and
, respectively, the CDF and PDF function of
X and
Y. Then
Substituting
, we can rewrite (
18) as
Hence, (
17) follows from (
4) and (
19). □
Remark 4. It follows from (5) and (6) that (17) can be rewritten in terms of H-function as Multicomponent System Reliability
Let
be independent and identically distributed random variables with distribution
and
Y be an independent random variable with distribution
. Set
. Then,
and we have
In a broader context, consider independent random variables
with
The reliability in a multicomponent stress–strength model is given by
Using a binomial expansion, we obtain
Note that the integral terms in (
22) is the same as (
21) when
. Therefore,
Closed expressions for (
21) are presented below.
Theorem 3. Let be independent and identically distributed random variables with distribution and Y be an independent random variable with distribution . Then
whereandprovided that . When , (
23)
holds provided that . When :
Proof. For simplicity of notations, denote
and
It follows from (
3) and (
21) that
By binomial expansion
Observe that
, which implies
If
, it follows from (3.15) in [
3] that
provided that
. If
, (3.17) in [
3] implies (
27) since
.
Observe that the integration range can be simplified using the results for the intersection of the supports of
and
, such that:
where
and
. Then, if
and
(case
and
is analogous), we have that
is given by
Substituting
, we obtain
Hence, (
23) follows from (
25), (
26), (
27) and (
28). On the other hand, when
, the proof follows the same rationale as in the case of the proof of Theorem 2, just considering the binomial expansion in the process. This proof is omitted for simplicity. □
4. Estimation
This section deals with parameter estimation for
given two independent TGEV random variables. The literature presents maximum likelihood estimators (MLEs) for
R considering explicit forms of
R obtained after severe parameter restrictions on extreme-value distributions (such as [
14,
15,
22]). Those approaches require the estimation of the parameters to be done jointly in the two samples and require a series of transformations to be properly applied for TGEV components. For the TGEV distribution, we have two cases to consider:
and
. The first case requires
or
(Theorem 1). On the other hand, if
, we release any restrictions on the parameters for the expressions, as a single formula can be used to obtain
R in terms of
functions (Theorem 2).
4.1. MLE for R
Let and independent random variables with . Theorem 1 indicates that , where we denote Thus, let be a random sample of and consider an independent random sample of , with . Let be the estimates of . Since Theorem 1 describes R in terms of integrals (hence continuous and measurable functions), we can estimate R simply as due to the invariance property of MLE.
4.2. Parameters Estimation of TGEV Samples
Consider the PDF
defined in (
7). Take
and
independent random samples of sizes
n and
m, respectively. The likelihood function is given by:
When
, the support of
f does not depend on unknown parameters and the Theorem 2 does not require parameter restrictions. The log-likelihood function is given by
Then, the MLE can be obtained by the log-likelihood function (
30), equating its gradient to zero and finding its critical points.
When
, the support of
f depends on the unknown parameter
. Then, we are not able to obtain the MLE explicitly, so an additional numeric procedure is required to perform the likelihood maximization. This is similar to what happens with the GEV distribution (see [
6] for a more detailed discussion). The likelihood function becomes:
where
and
. Note that
if and only if
for all
. A similar restriction should be observed for
. Numerical procedures must be applied to overcome the unavailability of an explicit MLE expression.
4.3. A Two-Step Estimation and Confidence Intervals
We introduce an alternative method for estimating through a two-step process outlined below:
Example 1. Given the samples and ,
Step 1 We estimate and using MLE for the GEV (This estimation can be carried out utilizing the extRemes package within the R software version 4.3.3 [23].) distribution; Step 2 The parameters are estimated by determining where is derived from (
29)
using the estimated parameters from Step 1 as initial guesses.
To choose between Theorems 1 and 2 to obtain , we need to verify if:
or,
- (b)
or .
Despite this additional verification, the computational time required for theExample 1 is expected to be less than that required for directly maximizing (
29) and (
30).
Example 2 describes the approach used in
Section 5 to obtain confidence intervals (CIs) for the estimates of
R.
Example 2. Let be a sample of size n and M be a positive integer denoting the number of bootstrap repetitions.
Step 1 Generate bootstrap samples .
Step 2 Compute the estimates based on . In this case, the parameters of each bootstrap sample are individually estimated using Example 1.
Step 3 Obtain using Theorem 1 or 2.
Step 4 Repeat Steps to M times.
Step 5 The approximate confidence interval of is given by , where and is the cumulative distribution function of .
For the problem of asset selection using stress–strength reliability, only a single time series of observed returns is available for each asset. Then, the maximum likelihood estimation approach above is of utmost importance. To illustrate the suitability of the analytical closed-form expressions hereby derived, a simulation study is carried out in the next section. In such a case, several samples of size n can be drawn from each random variable, which is then used to estimate the value of R and can be repeated several times.
5. Applications
In this section, we provide a study involving Monte-Carlo simulations that analyze the performance of estimator . Additionally, we apply the stress–strength reliability model discussed in the preceding sections to actual real-world data.
5.1. Simulation Study
To evaluate the performance of the estimator , we fix several values of the parameters ,,, , ,,, , and then we generate Monte-Carlo samples, each of which of size , of the random variables and . We analyze the estimates , bias, and root mean squared error (RMSE).
As described by [
19], random samples of TGEV distribution can be generated by the inversion method using the quantiles
where
U is a uniform random variable in
.
- (1)
for each Monte-Carlo sample, the estimate is computed;
- (2)
is evaluated by taking the sample mean of the Monte-Carlo samples ;
- (3)
the bias is computed as the difference between the theoretical R value and . The same applies to the root mean squared error, which also considers the true value as the analytically obtained one.
The TGEV distribution with negative-shape parameters is treated in
Table 2,
Table 4 and
Table 6 (for
, 1000 and 10,000, respectively), while
Table 3 and
Table 5 deal with positive-shape parameters. In both cases, the estimator shows good behavior with minimal bias and low root mean squared error. Furthermore, it is clear that increasing the number of replications
N leads to the same conclusions.
5.2. Real Data Set Application
Asset selection is addressed to evaluate the proposed framework. To guide the selection of financial assets when managing a portfolio, we adopt metrics of the type .
We start by modeling stock price log-returns as TGEV distributions, and afterward, we compare log-returns from tickers (companies) of different economic sectors and traded on BOVESPA (São Paulo Stock Exchange): BBAS3.SA (banking: Banco do Brasil S.A.), ITUB4.SA (banking: Itaú Unibanco Holding S.A.), VALE3.SA (mining: Vale S.A.) and VIIA3.SA (retail: Via Varejo S.A). From now on, we will omit the “.SA” suffix present on the tickers under analysis. The time series for each ticker represents the daily closing prices in Brazilian currency (R$, BRL) covering the period from 1 January 2022 to 30 April 2023. The analyzed data comprises a total of 331 daily prices.
Figure 2 presents the stock prices for each ticker, highlighting their distinct value scales and volatility. Subsequently, we aim to compare the returns using the expression
.
It is important to point out that these data sets were analyzed previously in the literature [
3], and here we show that TGEV distribution fits the log-returns better than GEV, according to information criteria. The daily closing prices were imported directly through the software R by the command:
ticker = "BBAS3.SA"
quantmod::getSymbols(ticker, src = "yahoo", auto.assign = FALSE,
from = ’2022-01-01’, to = ’2023-04-30’, return.class = ’xts’).
We assume that the returns are independent. To meet independence requirements, we measure log-returns correlations using Pearson, Kendal, and Spearman methods comparing pairs of stocks that are not correlated. The pairs selected for comparison– VALE3-BBAS3, VALE3-ITUB4, and VALE3-VIIA3–all exhibited correlation measures equal to or below
, as detailed in the Appendix of [
3].
Figure 3 displays the autocorrelation function of log-returns, indicating an absence of discernible temporal correlations among the returns.
Descriptive statistics for the four data sets are presented in
Table 7, showing certain symmetry of log-returns around zero and greater variability for VIIA3. The sample size was
and each sample informs the daily closing stock price log-return. Across all data sets, the existence of extreme values is a consistent characteristic, which is in accordance with the nature of financial data.
Quintino et al. [
3] showed that the GEV distribution adequately fits the data. Our interest lies in determining if the addition of the
parameter provided by the TGEV distribution will improve the model fit. To accomplish this, we employed the two-step estimation method, described in theExample 1. Considering the different number of parameters of the GEV and TGEV models, to conduct a comparative analysis between these models, we utilized the information criteria Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and Efficient Determination Criterion (EDC). Parameter estimates for the stock prices log-returns are presented in
Table 8, while
Table 9 shows that all criteria indicate that there was an improvement in the fit when using the TGEV distribution, compared to GEV.
The adequacy of the fitted TGEV distributions can be assessed through graphical evaluation methods. This includes plotting the theoretical PDF over the histogram (
Figure 4), comparing the theoretical CDF against the empirical CDF (ECDF) (
Figure 5), and examining the Normal Quantile-Quantile plots of the residuals (
Figure 6). Although the Kolmogorov–Smirnov test rejects the TGEV adjustment for BBAS3 data, a visual examination of the histogram and ECDF might not discredit the suitability of the TGEV distribution. Furthermore, the Kolmogorov–Smirnov test tends to be overly sensitive, particularly for medium to large sample sizes, leading to its responsiveness even to minor deviations, which might account for this discrepancy.
Reliability measures, denoted as
, play a pivotal role in an investor’s decision-making process. To simplify, when
X and
Y symbolize profit from log-returns and
, the investor tends to favor selecting the financial asset corresponding to
X. Conversely, if
, the investor leans toward the opposite choice. However, when
, the decision becomes inconclusive. In this sense,
Table 10 presents the estimates of
and the
Bootstrap confidence intervals, obtained by
and Example 2.
Utilizing the GEV distribution, reliability estimates
of 0.54, 0.54, and 0.43 for the VALE3-BBAS3, VALE3-ITUB4, and VALE3-VIIA3 pairs were obtained in [
3], respectively. These values closely resembled those outlined in
Table 10 for TGEV distribution. Regarding confidence intervals, there was a reduction in the interval size for the last pair, while the sizes remained consistent for the others.
Point estimates can also be compared with an empirical estimator that does not depend on the estimation of parameters or the choice of a probabilistic model. Let one consider the estimator:
where
denotes the indicator function on the set
A and
n is the sample size. The estimates obtained are, respectively, 0.55, 0.55, and 0.43 for the pairs presented in
Table 10, showing the proximity of parametric and nonparametric estimates.
In
Table 10, all the confidence intervals crossed the
edge; however, reliability measurement for the pair VALE3xVIIA3 brings some evidence that VALE3 should be the asset to be selected.
6. Conclusions
In this paper, we studied the stress–strength reliability when both X and Y follow independent TGEV distributions. Thus, exact expressions for R have been obtained in terms of the extreme-value -function with minimal parameter restrictions. With additional restrictions, it was shown that R can be calculated in terms of H-functions.
The present work evaluated the advantages yielded by adding a parameter to the GEV distribution and modelling data sets with the TGEV distribution. The added parameter brought a more complex analytical derivation of and an expected increase in the computational effort to estimate it. To avoid the computational burden of an added parameter, we proposed a two-step estimation where we first fit a GEV model and then estimate the TGEV parameter . Notwithstanding the complexities of an added parameter, information criteria demonstrated the superiority of TGEV models when compared to GEV ones. This advantage is also perceived when estimating probabilities by obtaining better estimates.
Monte-Carlo simulations attested to the performance of the analytical closed-form expressions hereby derived. By applying our methodology to real-world financial data, we could orient a stock selection procedure by calculating when both X and Y represent stock returns. In summary, when X and Y represent the return of the stock prices and , the investor should choose the variable X. If , the opposite occurs. The case is inconclusive.
The framework we explored in this work can be a starting point to study probabilities for recently proposed extreme-value distributions like bimodal Gumbell, bimodal Weibull, bimodal GEV, and extreme-value bivariate models.