1. Introduction
Pareto distribution was named after the Italian economist Vilfredo Pareto (1848–1923). The Pareto distribution has gained considerable attention in modeling many applications with heavy-tailed distributions, such as income distribution, earthquakes, forest fire areas, and disk drive sector errors [
1,
2]. The Pareto IV family is a general family of distributions. Pareto I, Pareto II, and Pareto III distributions are special cases of the Pareto IV family. Also, the Burr family can be regarded as a special case of Pareto IV (see, [
3,
4]). There are several studies in the literature generalizing the Pareto distribution to make it richer and more flexible for modeling data. These include the generalized Pareto [
5], beta-Pareto [
6], beta-generalized Pareto [
7], Weibull–Pareto [
8], gamma-Pareto [
9,
10], Kumaraswamy exponentiated Pareto [
11], and exponentiated Weibull–Pareto distribution [
12].
In recent works, adding new parameters to existing distributions or using different methods makes the resulting new distribution more appropriate and efficient for modeling the lifetime data. Many distributions have been generalized in the literature. These include the logit of the Kumaraswamy distribution [
13], the generalized beta-generated distribution [
14], the Weibull-G family of distribution [
15], the gamma-exponentiated exponential distribution [
16], and the transmuted Weibull-Pareto distribution [
17]. Very recently, some new odd distributions were proposed in the literature, such as the odd Birnbaum–Saunders distribution [
18], the odd Burr-III family of distributions [
19], the odds exponential-log logistic distribution [
20], the odd log-logistic-Fréchet distribution [
21], the odd log-logistic-Burr XII distribution [
22], the odd exponentiated half-logistic Burr XII distribution [
23], the odd Lomax-G family of distributions [
24], the odd Dagum-G family of distributions [
25], and the odd log-logistic Lindley-exponential distribution [
26].
This article used the transformed-transformer (T-X) family by Alzaatreh et al. [
27] to introduce an odds exponential-Pareto IV distribution, in which the cumulative distribution function (CDF) is defined by
where r(t) is the probability density function (PDF) of a random variable
, such that
and W(F(x)) is a function of any CDF, that takes different forms, see Alzaatreh et al. [
27]. In this study, we consider the odds function form,
. That is, the CDF will be
and we considered the exponential distribution for
and
is the Pareto IV distribution with parameters
in Equation (
2). The resulting generated distribution will provide more flexibility in accommodating different types of the hazard function for the generated distribution. Also, this proposed distribution will be more suitable for modeling and fitting different real-life data
Therefore, we now define the odds exponential-Pareto IV (OEPIV) distribution with CDF given by
The PDF of OEPIV is
where
,
are the shape parameters,
is the scale parameter, and
is the inequality parameter.
Recently, there has been a great deal of interest in the literature investigating the relationship between survival time and some other covariates, such as sex, weight, blood pressure, and many others. In a number of applications, different parametric regression models were used to estimate the effect of covariate variables on the survival time, including the log-location-scale regression model. The log-location-scale regression model is distinguished since it is commonly used in clinical trials and in many other fields of application. It is also widely used in engineering models where failure is accelerated by voltage, temperature, or other stress factors [
28]. Several studies in the literature applied the log-location-scale regression model based on different distributions, such as the log-modified Weibull [
29], the log-Weibull extended [
30], the log-exponentiated Weibull [
31], the log-Burr XII [
32], the log-beta Weibull [
33], the log-beta log-logistic [
34], the log-Fréchet [
35], the log-Exponentiated Fréchet [
36], and the log-gamma-logistic [
37]. Recent studies used the log-location-scale regression model built from the logarithm odd of the distribution. For instance, the odd log-logistic-Weibull [
38], odd log-logistic generalized half normal [
39], and odd Weibull [
40].
This article is organized as follows: In
Section 2, we define the survival and hazard functions of the OEPIV distribution with some graphical representations. We derived some of the OEPIV properties in
Section 3. In
Section 4, we explain the maximum likelihood estimation for parameters of the odds exponential-Pareto IV distribution. Simulation studies are provided to illustrate the performance of the OEPIV distribution in
Section 5. In
Section 6, we address the log odds exponential-Pareto IV (LOEPIV) distribution along with some of its statistical properties, in addition to introducing a log-location regression model based on LOEPIV and discussed its parameter estimates via maximum likelihood and Jackknife methods. In
Section 7, three applications are analyzed to demonstrate the performance of the introduced new distribution and its regression model. Finally, we report our conclusions in
Section 8.
6. The Log Odds Exponential-Pareto IV Regression Model
If X is a random variable from the OEPIV distribution, as given in Equation (
4), then
is a random variable that has a LOEPIV distribution with the transformation parameter
and
. Therefore, the PDF and CDF of the LOEPIV distribution are as follows:
where
is the scale parameter,
,
are the shape parameters, and
is the location parameter. The LOEPIV model becomes the log exponential-Pareto (LEP) distribution for
. The PDF (for
) of the LEP distribution with parameters
,
and
, is
The SF and HF are given by
The following are the properties for the LOEPIV distribution:
The quantile of the LOEPIV distribution
The mode of the LOEPIV distribution
Then, the mode can be obtained by solving Equation (
27) numerically.
The median of the LOEPIV distribution
The mgf of LOEPIV distribution
Substituting
will reduce the above integration to
Then, using the binomial expansion
can be rewritten as
Using the gamma function. Thus, the mgf of LOEPIV distribution is as follows
The standardized random variable for y in Equation (
22) is defined as
, then z has the following PDF
with SF given as
Hence, a linear location-scale regression model with response variable
and explanatory vector
can be defined as
where
is the random error with PDF in Equation (
24),
, and
,
, and
are the unknown parameters.
is the location of
and the location vector
can be represented as a linear model
, in which
is the known model matrix. Therefore, the SF of
is expressed as:
6.1. Estimation of the LOEPIV Regression Model
6.1.1. ML Method
For the right-censored lifetime data, we have
, where
is the lifetime and
is the censoring time, then, we have
for the
individual
. If we have a random sample with n observations
,...,
, where
, and assuming the censoring and lifetimes are independent and random. Then, the likelihood function for the regression model in (
31) with
assuming right censoring is as follows:
where
and
are given by Equations (
17) and (
19) of
, respectively. The
ℓ for
reduces to
where
represents the uncensored data, and
. The ML estimate for the parameter vector
could be obtained using an optimization algorithm that maximizes Equation (
32).
6.1.2. Jackknife Method
The jackknife technique was developed by Quenouille (1949) to estimate the bias of an estimator. It is an alternative method to estimate the LOEPIV parameters based on “leaving one out”.
Suppose that
is the parameter estimation of the whole sample and
is the parameter estimation when we dropped the
observation from the data. That is, the pseudo-value of the
observation is obtained as
Then, the jackknife estimate of
is the mean of pseudo-values, denoted
is
For more details, see [
42,
43,
44].
6.2. Sensitivity Analysis: Global Influence
Global influence, introduced by [
45], is used to conduct a sensitivity analysis that represents the diagnostic effect depending on the case deletion. Case deletion measures the impact of dropping the
observation from the data set on the estimate of the parameters. That is, this method is based on comparing the difference of
and
where
is the estimated parameters when the
observation is dropped from data. If
is distant from
, then this case is considered as influential. The case deletion model for the LOEPIV regression Model (
31) is
We denote the ML estimate of when the observation is dropped by . Then, we describe two methods of global influence below.
6.2.1. Generalized Cook Distance
Generalized Cook distance (GD) is the first measure of global influence and is defined as
where
denotes the observed information matrix.
6.2.2. Likelihood Distance
Likelihood distance (LD) measures the differences between
and
, and is given by
where
is the log likelihood function of
when the
observation is dropped from the data.
6.3. Residual Analysis
In the regression model, checking the assumptions and appropriateness of the fitted model is an essential step. Therefore, we used residual analysis to check the assumptions and detect outlier observations. In this study, we consider the following types.
6.3.1. Martingale Residual
Barlow and Prentice [
46] proposed the martingale residual as
where
denotes the censor indicator, where
, if the
observation is censored, and
, if the
observation is not censored, and
denotes the SF for the regression model. Therefore, the martingale residual of the LOEPIV regression model is
where
has a range between
and 1 and has skewness. Thus, the transformation of
will be used to reduce the skewness.
6.3.2. Deviance Residual
This is a further improvement of the martingale residual, which reduces the skewness and make it more symmetrical, around zero. It can be expressed as
where
is defined in Equation (
36), and the deviance for the LOEPIV regression model is