1 s2.0 S1018364721002767 Main
1 s2.0 S1018364721002767 Main
1 s2.0 S1018364721002767 Main
Original article
a r t i c l e i n f o a b s t r a c t
Article history: The coronavirus disease spread out rapidly in China and then in the whole world. Kuwait is one of those
Received 10 September 2020 countries which are positively affected by this pandemic. Objective: The current study aims to provide an
Revised 20 July 2021 appropriate and novel framework for the analysis of the Severe Acute Respiratory Syndrome coronavirus
Accepted 16 September 2021
2 (SARS-CoV-2) infected patient’s counts and rate of change in these counts with respect to time.
Available online 13 October 2021
Therefore, we considered the number of SARS- CoV-2 patients, i.e., confirmed cases, deaths, and recover-
ies for Kuwait, ranging from the 24th of February 2020 to the 25th of August 2020. Method: Here, we used
Keywords:
the Markov Chain Monte Carlo (MCMC) simulation methods for the data analysis of SARS-CoV-2 to
Bayesian analysis
Non-homogeneous Poisson process
develop the Bayesian analysis of the Non-Homogeneous Poisson Process (NHPP). For this purpose, we
SARS-CoV-2 used the two unique models of NHPP: the linear intensity function and the power law process. The dis-
Markov chain crimination methods are also discussed to select a better model for daily basis data of confirmed cases,
COVID-19 pandemic deaths, and recoveries of SARS-CoV-2 patients. The appropriate model is selected based on the Deviance
Kuwait Information Criteria (DIC). Results: The value of DIC indicates that the power-law process performs better
than the linear intensity functions for estimating and presenting all the study variables. The current study
explored the usefulness and significance of the proposed research framework to analyze the SARS-CoV-2
new confirmed cases, recoveries, and deaths in a specific area. Conclusion: The findings of the study will
be helpful for the health organizations or authorities to develop the approaches based on the current
resources and situations due to the pandemic. The provided framework could be beneficial in analyzing
the second and third layers of COVID-19 in the area. The analysis of the counts for each study variable and
for each variable a comparative analysis of all the three layers is the aim of our future study.
Ó 2021 The Author(s). Published by Elsevier B.V. on behalf of King Saud University. This is an open access
article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
https://doi.org/10.1016/j.jksus.2021.101614
1018-3647/Ó 2021 The Author(s). Published by Elsevier B.V. on behalf of King Saud University.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
A. Al-Dousari, A. Ellahi and I. Hussain Journal of King Saud University – Science 33 (2021) 101614
approaches for appropriately presenting, analyzing, and forecast- months periods (the 24th of February 2020 to the 25th of
ing the COVID-19 pandemic trends worldwide to reduce drastic August 2020) is used. This data set is provided by the Central
effects at the early stages of its layer/wave. Several models had Agency for Information Technology, Kuwait, and collected from
been used and proposed recently for the analysis of time series https://corona.e.gov.kw/en/Home/CasesByDate. The data of
and point pattern data. The researchers developed various statisti- SARS-CoV-2 infected patients in Kuwait provided by the Central
cal techniques for analyzing and forecasting the COVID-19 situa- Agency for Information Technology, Kuwait, are highly efficient
tions in most affected regions and countries globally by utilizing and accurate.
time-series counts data (see Giuliani et al., 2020; Fanelli 2020;
Yang et al., 2020). 2.2. Mathematical description of Non-Homogeneous Poisson processes
From the previous literature, it is generally noted that in the point
pattern data analysis and analyzing the rate of change, the Homoge- Here, in this section, we discussed the mathematical structure
neous and Non-Homogeneous Poisson processes (HPP and NHPP) of the NHPP models. Let N(t) be the accumulative numbers of
play an essential role. For the HPP, it is to be assumed that the count- new confirmed cases, recoveries, and deaths, which are observed
ing of events or patient is autonomous among dissociate regions during the time interval (0,T) where T 0. Let K denotes the total
with a homogeneous intensity, which is hardly fulfilled in real-life numbers of new confirmed cases, recoveries, and deaths in the
data. On the Other hand, for NHPP, the expected number of event time interval (0, T). So, it is to be supposed that {N(t)} follows
occurrences and the number of patients is considered and assumed the NHPP with mean value function m(t; H) and intensity function
to be varied with time. Therefore, mostly the analysts prevented c(t)= om(t)/ot, where, t 0 and b be the vector of parameters of
using HPP and preferred to use NHPP in the point pattern data anal- the models. Here we consider the two individual cases of NHPP
ysis for both implementation time data and calendar time data (Lai for the comparative study of our analysis. These two functions of
and Garg, 2012). The NHPP provides a system for explaining what is m(t) or c(t) .i.e., the Power Law Process (PLP), and Linear Intensity
measured as the point pattern data (Diggle, (2013). (Rodrigues et al., Functions (LIF) are frequently used in the analysis of drought
2015) used NHPP to analyze exceedance events in air quality stan- events, reliability, and operating safety policies. The mathematical
dards. Iervolino et al. (2014) used it to explore the intensities of functions of the mean value function, (t), and intensity function, c
earthquake ground motion. Achcar et al. (2016) and Ellahi et al. (t), for PLP are,
(2020) used the NHPP to analyze drought periods. a a1
t a t
The main point that should be considered in the analysis of the mðt=hÞPLP ¼ and cðt=hÞPLP ¼ where; h ¼ ða; bÞ 2 ð0; 1Þ:
b b b
number of events or patients is to determine the appropriate mean
value function (m(t)) or intensity function (c(t) = om(t)/ot) to esti-
Similarly, the mathematical functions of m(t) and YðtÞ for LIF
mate the expected number of events or patients because the NHPP
are,
has several mean value functions with different assumptions. Fur-
mðt=hÞLIF ¼ tk0 þ ðat2 Þ and Yðt=hÞLIF ¼ k0 þ at; where; h ¼ ðk0 ; aÞ 2 ð0; 1Þ
2
2.1. Data and study area where the likelihood function from the time truncated model is
given by,
For our study, the daily basis data set of each, i.e., new L(HLIF or HPLP;St) (pj = 1 k c(tj/HLIF or HPLP)) exp (-m(T/HLIF or HPLP))
confirmed cases, recoveries, and deaths, for approximately six
2
A. Al-Dousari, A. Ellahi and I. Hussain Journal of King Saud University – Science 33 (2021) 101614
Fig. 1. Bar-plots for the counts of daily new confirmed cases recoveries and deaths.
where c (tj/HLIF or HPLP) and m(T/HLIF or HPLP) are the intensity ated by an arbitrary value, which can be assumed or calculated from
function and mean value function of LIF or PLP. The simulated sam- the priors and then gradually converged to a target value. Several
ples for P(HLIF or HPLP/St) and the posterior summaries of inter- techniques can be used to check this convergence; trace plots and
ests for bLIF and bPLP are obtained under standard MCMC some useful summary statistics are frequently used. These plots
algorithm, i.e., Gibbs sampling. In MCMC, Gibbs sampling is initi- and results provide clear indications of stabilized simulations. The
3
A. Al-Dousari, A. Ellahi and I. Hussain Journal of King Saud University – Science 33 (2021) 101614
Fig. 2. Accumulated numbers of new confirmed, recovered and deaths of COVID-19 patients.
convergence can be evaluated with a reasonable degree of assur- quacy measure called Deviance Information Criteria (DIC), we
ance by utilizing these indications. For this purpose, Su and Yajima choose the suitable model of NHPP for modeling each study vari-
(2012) introduced the R software library ‘‘R2jags,” which provides able, i.e., the number of new confirmed cases, the number of recov-
considerable simplification. Based on well known Bayesian ade- eries, and deaths concerning the time interval T = 184 days. For
4
A. Al-Dousari, A. Ellahi and I. Hussain Journal of King Saud University – Science 33 (2021) 101614
further details of the intensity functions and Bayesian approach, mately 1500, and the number of new cases was around 1050. How-
kindly visit the Guarnaccia et al., 2015; Achcar et al., 2016; Meng ever, from the mid- month of June, the number of new confirmed
et al., 2017 and Ellahi et al., 2020. cases was more than the number of recovered patients on a daily
basis. Fig. 1c presents that the number of deaths increases after
4. Results the mid of April, and in May and June, the number of deaths is
higher than the other months. However, before starting July, the
Fig. 1 presents the Bar-plots for the frequency distribution of number of deaths daily was reduced until the 25th of August.
new confirmed cases, recovered cases, and deaths. Fig. 1a indicates Fig. 2 shows the accumulated number of new confirmed cases,
that the ratio of the latest confirmed cases increases after the 1st of the accumulated number of recoveries and deaths with respect to
April 2020. This ratio reaches its peak in the mid of May, and after time or days for the time period from the 24th of February to the
the 1st of June, the number of new confirmed cases on a daily basis 25th of August 2020. These plots present the total aggregated
moderately decreased. Fig. 1b also provides detailed information new confirmed cases recoveries and deaths for each day versus
about the recoveries of SARS-CoV-2 infected patients. Its bar-plot the days of each study month. To model the accumulated number
shows that in the start of April the number of recoveries was very of new confirmed cases, recoveries, and deaths, we considered the
low concerning the new confirmed cases, but at the 1st of June two NHPP models, i.e., PLP and LIF, and the parameters for each
2020 ratio or the number of recoveries were much more than the model and each study variable are estimated using the Bayesian
new confirmed cases, i.e., the number of recoveries were approxi- approach under MCMC simulations as briefly explained in the
5
A. Al-Dousari, A. Ellahi and I. Hussain Journal of King Saud University – Science 33 (2021) 101614
methodology section. For prior distributions of required parame- where the distributions become in equilibrium states, see Fig. 3,
ters in our study, we assumed that in the case of HLIF = (k0, a), and the remaining 30,000 were used for the convergence checks
k0 [0,100] and a U[0,100], and in case of HPLP = (a,b), a U and summarization of the results. The trace plots in Fig. 4 indicate
[0,100] and b U[1,100]. There is no hard and fast rule for the val- the convergence of the MCMC, and it can be verified by the sum-
ues of the parameters of prior distributions, i.e., for hyperparame- mary statistics regarding each parameter provided in Table 1.
ters. Those values of hyperparameters can be used for which the The marginal posterior density plots of all the parameters of each
distributions have minimum variance, or these values can also be model for individual study variables are presented in Fig. 5. Simi-
considered based on the researcher’s personal experiences related larly, the autocorrelation plots are also shown in Fig. 6. It provides
to concern problems. In the Bayesian analysis of each NHPP model, the pattern of serial correlation in the chain, where the consecutive
the above discussed non-informative uniform priors were utilized draws of the parameters from the conditional distributions were
for each study variable. The single Markov chain was set up for the correlated.
sample’s simulations of the joint posterior distribution of LIF and The Monte Carlo estimates for the Posterior summaries of inter-
PLP and sampled it for 35,000 reiterations. The first 5000 iterations est and the Monte Carlo errors based on 30,000 simulated samples,
were considered as the burn-in samples to eliminate or minimizes by taking every 150th simulated value, are given in Table 1. Simi-
the effects of the initial value used in the iteration process. Based larly, the Monte Carlo measures for the DIC values of each model
on history plots, we considered the number of burn-in samples, and each study variable are presented in Table 2. In the case of
6
A. Al-Dousari, A. Ellahi and I. Hussain Journal of King Saud University – Science 33 (2021) 101614
Table 1
Simulation results for the marginal posterior distribution of parameters and their properties.
Fig. 5. Margional posterior density plots of each parameter for PLP and LIF.
7
A. Al-Dousari, A. Ellahi and I. Hussain Journal of King Saud University – Science 33 (2021) 101614
alpha beta
1.0 1.0
0.5 0.5
0.0 0.0
Deaths -0. 5
-1. 0
-0.5
-1.0
0 20 40 0 20 40
lag lag
alpha beta
1.0 1.0
0.5 0.5
0.0 0.0
Recovered -0. 5
-1. 0
-0. 5
-1. 0
0 20 40 0 20 40
lag lag
alpha lambdano
1.0 1.0
0.5 0.5
0.0 0.0
Deaths -0.5 -0.5
-1.0 -1.0
0 20 40 0 20 40
lag lag
alpha lambdano
1.0 1.0
0.5 0.5
0.0 0.0
Recovered -0.5 -0.5
-1.0 -1.0
0 20 40 0 20 40
lag lag
LIF, the DIC values for the new confirmed cases, deaths, and recov-
eries are 338338, 4608.70, and 239798. Similarly, the DIC values in Table 2
the case of PLP for the new confirmed cases, fatalities, and recover- DIC values of each model for each study variable.
ies are 279248, 4024.110, and 227896. The DIC values in the case of Study Variables Power Law Process Linear Intensity Function
PLP for each study variable are much less than or smaller than the Confirmed Cases 279,248 338,338
DIC values of LIS. Therefore, we selected the NHPP model based on Deaths 4024.110 4608.70
PLP and used it for the analysis of the new confirmed cases, deaths, Recoveries 227,896 239,798
and recoveries regarding COVID-19. We use the marginal posterior
means of the PLP parameters from Table 1 to model each study
variable and estimate the accumulated counts of these study vari-
ables with respect to time. The RMSE’s of the calculated results for Table 3
new confirmed cases, deaths, and recoveries are 5489.937, Root mean squared errors of estimated models.
54.4633, 5513.525, which are also provided in Table 3. The esti- Variable RMSE
mated and observed number of new confirmed cases, fatalities, Confirmed cases 5489.937
and recoveries are presented in the plots of Figure 7. The values Deaths 54.4633
of RMSE, s in Table 3 and the plots of Fig. 7 clearly show that the Recoveries 5513.525
PLP performs very well in the estimations of COVID-19 pandemic
8
A. Al-Dousari, A. Ellahi and I. Hussain Journal of King Saud University – Science 33 (2021) 101614
Fig. 7. Estimated and onserved counts of daily new confirmed cases recoveries and deaths.
study variables. The Redline in the plots presents the estimated 5. Discussion
accumulated means estimated at each day. The number of new
confirmed cases, deaths, and recoveries of SARS-CoV-2 infected The WHO indicates that COVID-19 viral infection keeps on
patients was reported using the estimated parameter values. Here developing, and presenting a significant issue to Public health
the term accumulated denotes the mean value function assessed or and the world economy. On the 31st of January 2020, WHO
evaluated at each day counts of our study variables. announced a worldwide emergency, and regardless of strict con-
9
A. Al-Dousari, A. Ellahi and I. Hussain Journal of King Saud University – Science 33 (2021) 101614
trol, now it changed over to pandemics worldwide. Kuwait is also Fig. 7 (a, b and c) presented the estimated accumulated counts
one of the most influenced countries by this pandemic. The num- (using the results of Table 1 for PLP) of each day with the observed
ber of affirmed cases and the number of deaths are still increasing accumulated counts versus the days of all studied months. These
worldwide, including in Kuwait. As the level of that infection plots and their respective RMSE, s provided in Table 3, clearly show
changes state to state; thus, the executions of these controlled sys- the better performance of the PLP model in each case.
tems or guidelines additionally fluctuate as indicated by the
national circumstances. Consequently, the utilization of statistical 6. Conclusion
tools has incredible noteworthiness to anticipate or predict the
pandemic patterns of this contamination around the world. This paper presents the framework to analyze the accumulated
This paper uses multiple statistical tools for the descriptive and counts of new confirmed cases, deaths, and recoveries of SARS-
inferential analysis of COVID- 19 patients in Kuwait. At the initial CoV-2 infected patients in Kuwait from the 24th of February
stage, Bar plots and Time-series plots were used for the descriptive 2020 to the 25th of August 2020, i.e., the first layer of COVID-19
analysis of the number of confirmed cases, numbers of deaths, and pandemic. The descriptive analysis of the counts summarizes the
recovered SARS-CoV-2 infected patients. Fig. 1 shows the bar plots data efficiently and effectively. The NHPP models with linear inten-
based on the daily counts of new cases, deaths, and recoveries from sity function and Power-law process being used for comparative
the 24th of February 2020 to the 25th of August 2020. These bar study of the behavior of accumulated counts of COVID-19 pan-
plots present the fluctuations in the counts and provide a detailed demic. The parameters of the NHPP models were computed by
summary of the study variables daily with their six-month pattern. using a Bayesian approach with Gibbs sampling under the MCMC
Fig. 2 shows the time series plots for the accumulated counts of algorithm. That performs very well in the simulation and estima-
new confirmed cases, deaths, and recoveries. These plots present tion of model parameter values. The results and graphs indicate
the rate of change of the counts concerning time. Fig. 2 (a, b, and that the NHPP models under PLP perform much better than LIF.
c) indicate that the rate of change was moderately increasing from The presented data clearly showed that during the first layer of
May 2020, and it became so high in June 2020 for each study vari- COVID-19 Pendamic in Kuwait, the intensity varied with time
able. However, the rate of change became moderate and remained and reached a high level in the mid of our study period. These fluc-
the same from the end of June till the 25th of August. tuations in the intensities were efficiently estimated by the appro-
A common problem with counts data in statistical inference is priate estimated intensity function in NHPP. The current study
the selection of an appropriate model. In our case, we also explored the usefulness and significance of the presented research
observed that the appropriate NHPP models could be of great framework to analyze the SARS-CoV-2 new confirmed cases, recov-
use. Interestingly, other intensity functions can also be used simi- eries, and deaths in an area. Similarly, the proposed framework
larly as considered the LIF and PLP in our study. We obtain the pos- may be utilized for other layers of the COVID-19 pandemic in
terior distribution summaries of quantities of interest by using Kuwait and other countries or regions. The outcomes of the study
MCMC techniques under Gibbs sampling. The library R2jags of will support the health organizations or authorities in developing
software R was very helpful in the simulation of samples from the approaches or strategies to overcome the effects of the
the posterior distributions of intensities parameters. These sam- COVID-19 pandemic dependent on the current resources and cir-
ples are then used to obtain the empirical summaries of the statis- cumstances due to the pandemic. It is essential to point out that
tics for the selection of appropriate NHPP models and are used to for the comparative study to improve the results, different inten-
draw inferences on the parameters (i.e., about the actual values sity or mean value functions of NHPP models can also be utilized.
of the model parameters) of interest. The convergence of Gibbs The results obtained from the proposed framework may be
sampling techniques was monitored by MC errors (provided in improved by utilizing the comparative analysis of some other suit-
Table 1) and observed by using the history plots, trace plots, poste- able intensity functions or by proposing an efficient and sufficient
rior density plots, and autocorrelation plots, as shown in Figs. 3, 4, intensity function for the NHPP model in such particular case stud-
5, and 6. The best models for our study variables were selected ies. Furthermore, the proposed framework doesn’t consider the
based on existing Bayesian adequacy criteria such as DIC (the abrupt change in the process, which is its drawback. These abrupt
approximation of Bayes factor); the lower the value of DIC better changes could be considered, and the errors in the study may be
will be the model. The PLP models for the counts of new confirmed reduced by assuming a specified parametrical form consist of some
cases, deaths, and recoveries were better than the LIF, as indicated additional parameters for these particular changes in the process.
by the values of DIC in Table 2. Therefore, for our further analysis, The work on all these limitations and the comparative studies of
for estimations of the counts of new cases deaths and recoveries all COVID-19 pandemic layers in Kuwait is the aim of our future
with its behavior for the observed time interval, we utilize the studies.
PLP models.
Considering the PLP models (which are the best-fitted models), Declaration of Competing Interest
we observed from the results of Table 1, that the MC error for both
parameters (alpha and beta) is low and can be acceptable, the mar- The authors declare that they have no known competing finan-
ginal posterior standard deviation is also so small, which indicates cial interests or personal relationships that could have appeared
that all the parameter values for the generated samples were con- to influence the work reported in this paper.
centrated at marginal posterior means of a and b in each case. As
we know, the intensity function has a flexible behavior for PLP
due to the value of a. The function of c(t/b)PLP is decreasing for
a less than 1, increasing for a greater than one, and constant for References
a = 1 (i.e., the NHPP is HPP). In our study, for the newly confirmed
cases, the value of a is 2.514; for the counts of deaths, the value of Jacobsen, K.H., 2020. Will COVID-19 generate global preparedness? Lancet 395
(10229), 1013–1014.
a is 2.412; and for the counts of recovered patients, a is 3.223, D. Giuliani, M.M. Dickson, G. Espa, F. Santi, Modelling and predicting the spatio-
which are greater than 1. So, in our study, the intensity function temporal spread of coronavirus disease 2019 (COVID-19) in Italy (2020).
is increasing. By using these values of a and b provided in Table 1, 10.2139/ssrn.3559569.
Fanelli, D.P.F., 2020. Analysis and forecast of COVID-19 spreading in China, Italy and
we could find the estimated average counts of new confirmed France. Chaos Solitons Fractals 134,. https://doi.org/10.1016/j.chaos.2020.
cases, deaths, and recoveries for each specified value of time. 109761 109761.
10
A. Al-Dousari, A. Ellahi and I. Hussain Journal of King Saud University – Science 33 (2021) 101614
Yang, Z., Zeng, Z., Wang, K.e., Wong, S.-S., Liang, W., Zanin, M., Liu, P., Cao, X., Gao, Z., Ellahi, A., Almanjahie, I.M., Hussain, T., Hashmi, M.Z., Faisal, S., Hussain, I., 2020.
Mai, Z., Liang, J., Liu, X., Li, S., Li, Y., Ye, F., Guan, W., Yang, Y., Li, F., Luo, S., Xie, Y., Analysis of agricultural and hydrological drought periods by using non-
Liu, B., Wang, Z., Zhang, S., Wang, Y., Zhong, N., He, J., 2020. Modified seir and ai homogeneous Poisson models: Linear intensity function. J. Atmosph. Solar-
prediction of the epidemics trend of COVID-19 in China under public health Terrest. Phys. 198, 105190. https://doi.org/10.1016/j.jastp.2020.105190.
interventions. J. Thorac. Dis. 12 (3), 165–174. https://doi.org/10.21037/ Guarnaccia, C., Quartieri, J., Tepedino, C., Rodrigues, E.R., 2015. An analysis of airport
jtd10.21037/jtd.2020.02.64. noise data using a non-homogeneous Poisson model with a change-point.
Lai, R., Garg, M., 2012. A detailed study of NHPP software reliability models. J. Softw. Applied Acoustics 91, 33–39.
7 (6), 1296–1306. Meng, Q., Qian, Y., Li, L., Wang, L., 2017. Data analysis on incomplete failure of ship
Rodrigues, E.R., Gamerman, D., Tarumoto, M.H., Tzintzun, G., 2015. A non- electromechanical system based on bayesian method. In: 2017 Prognostics and
homogeneous Poisson model with spatial anisotropy applied to ozone data System Health Management Conference (PHM-Harbin). IEEE, pp. 1–7.
from Mexico City. Environ. Ecol. Stat. 22 (2), 393–422. Vicini, L., Hotta, L.K., Achcar, J.A., 2012. Non-homogeneous Poisson processes
Iervolino, I., Giorgio, M., Polidoro, B., 2014. Sequence-based probabilistic seismic applied to count data: a bayesian approach considering different prior
hazard analysis. Bull. Seismol. Soc. Am. 104 (2), 1006–1012. distributions. J. Environ. Protect. 03 (10), 1336–1345.
Achcar, J.A., Coelho-Barros, E.A., de Souza, R.M., 2016. Use of non-homogeneous WHO, 2020. accessed on August 25, 2020. https://www.who.int/emergencies/
Poisson process (NHPP) in presence of change-points to analyze drought diseases/novel-coronavirus-2019?gclid=CjwKCAjwrvv3BRAJEiwAhwOdMzEtuv
periods: a case study in Brazil. Environ. Ecol. Stat. 23 (3), 405–419. WQBQPQNS8H8aLpDrfCdkilUEKk__M4leN9ezfIgMgv04cRqhoCA9wQAvD_BwE.
11