Building A Platform For Data-Driven Pandemic Prediction: From Data Modelling To Visualisation - The Covidlp Project 1St Edition Dani Gamerman
Building A Platform For Data-Driven Pandemic Prediction: From Data Modelling To Visualisation - The Covidlp Project 1St Edition Dani Gamerman
Building A Platform For Data-Driven Pandemic Prediction: From Data Modelling To Visualisation - The Covidlp Project 1St Edition Dani Gamerman
com
https://ebookmeta.com/product/building-a-platform-
for-data-driven-pandemic-prediction-from-data-
modelling-to-visualisation-the-covidlp-
project-1st-edition-dani-gamerman/
OR CLICK BUTTON
DOWLOAD EBOOK
https://ebookmeta.com/product/from-statistical-physics-to-data-
driven-modelling-simona-cocco/
https://ebookmeta.com/product/designing-data-governance-from-the-
ground-up-six-steps-to-build-a-data-driven-culture-1st-edition-
lauren-maffeo/
https://ebookmeta.com/product/identification-of-pathogenic-
social-media-accounts-from-data-to-intelligence-to-prediction-
alvari/
https://ebookmeta.com/product/data-journalism-a-story-driven-
approach-to-learning-data-reporting-mike-reilley/
Data-Driven Alexa Skills: Voice Access to Rich Data
Sources for Enterprise Applications 1st Edition Simon
A. Kingaby
https://ebookmeta.com/product/data-driven-alexa-skills-voice-
access-to-rich-data-sources-for-enterprise-applications-1st-
edition-simon-a-kingaby/
https://ebookmeta.com/product/building-an-event-driven-data-mesh-
early-release-adam-bellemare/
https://ebookmeta.com/product/data-driven-analytics-for-
sustainable-buildings-and-cities-from-theory-to-application-1st-
edition-xingxing-zhang/
https://ebookmeta.com/product/beginning-data-science-in-r-4-data-
analysis-visualization-and-modelling-for-the-data-scientist-2nd-
edition-thomas-mailund/
https://ebookmeta.com/product/statistics-and-data-visualisation-
with-python-jesus-rogel-salazar/
Building a Platform for
Data-Driven Pandemic
Prediction
Building a Platform for
Data-Driven Pandemic
Prediction
From Data Modelling to
Visualisation - The CovidLP
Project
Edited by
Dani Gamerman
Marcos O. Prates
Thaís Paiva
Vinícius D. Mayrink
First edition published 2022
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
© 2022 selection and editorial matter, Dani Gamerman, Marcos O. Prates, Thaís Paiva, Vinícius D.
Mayrink; individual chapters, the contributors
Reasonable efforts have been made to publish reliable data and information, but the author and pub-
lisher cannot assume responsibility for the validity of all materials or the consequences of their use.
The authors and publishers have attempted to trace the copyright holders of all material reproduced
in this publication and apologize to copyright holders if permission to publish in this form has not
been obtained. If any copyright material has not been acknowledged please write and let us know so
we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, access www.copyright.
com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA
01923, 978-750-8400. For works that are not available on CCC please contact mpkbookspermis-
[email protected]
Trademark notice: Product or corporate names may be trademarks or registered trademarks and
are used only for identification and explanation without intent to infringe.
DOI: 10.1201/9781003148883
Typeset in [font]
by KnowledgeWorks Global Ltd.
To Science
Contents
Preface xiii
Contributors xvii
I Introduction 1
1 Overview of the book 3
Dani Gamerman, Thaı́s Paiva, Guido A. Moreira, and Juliana Freitas
1.1 Objective of the book . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Data-driven vs model-driven . . . . . . . . . . . . . . 5
1.1.2 Real-time prediction . . . . . . . . . . . . . . . . . . . 7
1.1.3 Building platforms . . . . . . . . . . . . . . . . . . . . 8
1.2 Outline of the book . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.1 How to read this book . . . . . . . . . . . . . . . . . . 11
1.2.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Pandemic data 17
Dani Gamerman, Vinı́cius D. Mayrink, and Leonardo S. Bastos
2.1 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Occurrence and notification times . . . . . . . . . . . . . . . 20
2.3 Other relevant pandemic data . . . . . . . . . . . . . . . . . 22
2.4 Data reconstruction . . . . . . . . . . . . . . . . . . . . . . . 24
II Modelling 31
3 Basic epidemiological features 33
Dani Gamerman, Juliana Freitas, and Leonardo Nascimento
3.1 Introduction and main ideas . . . . . . . . . . . . . . . . . . 33
3.2 Model extensions . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3 Properties of epidemiological models . . . . . . . . . . . . . . 40
3.4 Are these models appropriate? . . . . . . . . . . . . . . . . . 45
4 Data distributions 53
Guido A. Moreira, Juliana Freitas, and Dani Gamerman
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2 The Poisson distribution . . . . . . . . . . . . . . . . . . . . 57
4.3 Overdispersion . . . . . . . . . . . . . . . . . . . . . . . . . . 60
vii
viii Contents
IV Implementation 159
9 Data extraction/ETL 161
Marcos O. Prates, Ricardo C. Pedroso, and Thaı́s Paiva
9.1 Data sources . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
9.2 Data preparation . . . . . . . . . . . . . . . . . . . . . . . . 163
9.3 Additional reading . . . . . . . . . . . . . . . . . . . . . . . . 167
V Monitoring 215
12 Daily evaluation of the updated data 217
Vinı́cius D. Mayrink, Juliana Freitas, Ana Julia A. Câmara, Gabriel O.
Assunção, and Jonathan S. Matias
12.1 The importance of monitoring the data . . . . . . . . . . . . 218
12.2 Atypical observations . . . . . . . . . . . . . . . . . . . . . . 220
x Contents
VI Software 275
15 PandemicLP package: Basic functionalities 277
Marcos O. Prates, Guido A. Moreira, Marta Cristina C. Bianchi, Débora
F. Magalhães, and Thais P. Menezes
15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
15.2 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
15.2.1 Installing from the GitHub repository . . . . . . . . . 279
15.3 Functionalities . . . . . . . . . . . . . . . . . . . . . . . . . . 280
15.3.1 COVID-19 data extraction and loading: load covid . 281
15.3.2 Visualising the data: plot.pandemicData . . . . . . . 281
15.3.3 Model fitting: pandemic model . . . . . . . . . . . . . 282
15.3.4 Predictive distribution:
posterior predict.pandemicEstimated . . . . . . . 283
15.3.5 Calculating relevant statistics: pandemic stats . . . 285
15.3.6 Plotting the results: plot.pandemicPredicted . . . . 286
15.4 Modelling with the PandemicLP . . . . . . . . . . . . . . . . 287
15.4.1 Generalised logistic model . . . . . . . . . . . . . . . . 288
15.4.2 Generalised logistic model with seasonal effect . . . . 291
15.4.3 Two-wave model . . . . . . . . . . . . . . . . . . . . . 298
15.5 Sum of regions . . . . . . . . . . . . . . . . . . . . . . . . . . 299
15.6 Working with user data . . . . . . . . . . . . . . . . . . . . . 304
Contents xi
Index 343
Preface
The project that led to this book started in March 2020. One of us was just
starting his graduate course classes on Dynamic models when the COVID-19
pandemic started in Brazil, forcing the suspension of presential classes. During
the first months of the pandemic, universities in Brazil were unsure of how to
proceed. Our university recommended that faculty should not continue classes
even in on-line mode and should resort only to challenges to the students and
basic exercises.
About that time, an intense virtual debate started among groups of faculty.
This pattern was observed in our Institute of Exact Sciences, which consists of
the departments of Statistics, Mathematics, Computing, Physics and Chem-
istry. New messages containing solutions to various problems relating to the
pandemic appeared every single day from members of all the departments.
One of the messages contained a data-driven proposal based on the lo-
gistic curve, with parameter estimation and prediction of the counts of new
cases until the pandemic ends. The message was written by a physicist and
contained an abridged version of a paper. This manuscript was the spark that
was missing to decide on how to entertain the students. After all, statisticians
should be able to handle at least the task presented by the Physics colleague.
The project was presented to the graduate students, who were very keen
on embracing the exercise and started working straight away on the project.
Their results started to appear and problems started to emerge. They were
discussed at regular meetings held at class time. After all, this was a challenge
for the student and this was allowed by the university rules to replace the
missing classes!
Our results were regularly passed on informally to our departmental col-
leagues. They always pointed to the need to inform the general public about
what the project was providing. This issue led to the need to build an appro-
priate platform for the release of the project information. One of us was drawn
to the project at this point. The preparation of the platform for releasing the
results led to the CovidLP app.
Round about the same time, the Ministry of Education opened an ur-
gent call for proposals on different research aspects of the pandemic. Our
project was submitted and subsequently approved: 2 post-doc grants and 2
workstations were dedicated to the project, giving the project the respective
methodological and the computational amplitudes it so badly needed. Two
other faculty and a former graduate student were also included. Our project
xiii
xiv Preface
gained the scalability it required. The CovidLP project was created. The name
was chosen to emphasise the interest of the project on long-term prediction.
A number of issues of all sorts appeared and were dealt with in the best
possible manner. These issues ranged from installing the workstations man-
ually and remotely in a deserted campus to addressing the methodological
difficulties and testing the proposed solutions. Many hours were spent study-
ing the literature and testing different approaches.
The CovidLP project gained national visibility after a series of work-
shop and seminar presentations, media interviews and news releases, and our
methodology was adopted by a few health officials in different administrative
levels in Brazil. By then, the project also contained a site and a blog to inform
the general public about the changes that were being introduced and discuss
them. It became obvious that another stream worth pursuing was software
for reproducing the analyses in a more general setting, suited for more expe-
rienced data analysts. It also became clear that the project was not restricted
to the COVID-19 pandemic. Thus the software was named PandemicLP, to
signal this change in scope.
After a few months, the project was mature. The participants were organ-
ised in focal groups and the production of information became more effective.
The project was getting ready for scientific publication. It became clear to us
that what we had developed thus far was worth reporting.
But it was clear to us that our story was not of Statistics. There are
better books already available to describe statistical aspects of epidemics. It
was also clear that our story was not of Computing. Again, there are better
books about the capabilities now widely available in software. The story of
the CovidLP project is a tale about the inseparable roles played by Statistics
and Computing for building such platforms for daily release of the results of
statistical analyses. This is the story that we want to tell.
By that time, CRC released a call to the scientific community for proposals
in general, including books, about the pandemic. It seemed to us the perfect
match, and after revising the proposal to include very thoughtful comments
from reviewers, whom we thank, the proposal was accepted. A major con-
tribution from this revision process was the recommendation for inclusion of
more epidemiological background on data and models. This was accepted and
this addition was provided by colleagues outside (but aware of) our project,
providing an important complement to our work. Our final proposal contains
the key elements of our task: data description, statistical modelling and mon-
itoring, computational implementation and software.
We worked very hard during the 9 months that elapsed between then and
now and are very pleased with our end result. Of course, this is way too short.
The main intention of the book is to provide data analysts with the tools
required for building platforms for statistical analyses and predictions in epi-
demiological contexts. A secondary goal is to allow users to make adaptations
in the book structure to guide them into an online platform solution to their
own data analysis problem, which may not even be related to Epidemiology.
Preface xv
We would like to finish by thanking the people who accompanied the de-
velopment of the CovidLP project. These include the users of our app, the
attendees at the talks we delivered, our academic colleagues who provided
useful inputs to the project and friends and families that provided us support
for achievement of this task. A very special thank you goes to the CovidLP
team, a group of dedicated students and post-docs that embarked on the jour-
ney that led to this book, reading countless papers and books, implementing
a number of computational codes and participating actively in the all steps
required for the completion of this book. Thanks are also due to Ricardo
Pedroso, for the book cover figure. A warm acknowledgement goes to Rob
Calver for his continued support and relentless effort to make this book possi-
ble in the best possible shape. We also thank the CRC team, especially Rob,
Michele and Vaishali, for the administrative support. They also arranged for
one text editor and 2 experts to review the entire book. Their reviews, which
we gratefully thank, provided more context and breadth to the book content.
In Chapter 1, the book provides different paths to follow in order to help
its readers achieving their own, different goals. If the book manages to help
readers to attain their platform building goals, then the book would have
achieved its goals.
xvii
xviii Contributors
Introduction
1
Overview of the book
Dani Gamerman
Universidade Federal de Minas Gerais/Universidade Federal do Rio de
Janeiro, Brazil
Thaı́s Paiva
Universidade Federal de Minas Gerais, Brazil
Guido A. Moreira
Universidade Federal de Minas Gerais, Brazil
Juliana Freitas
Universidade Federal de Minas Gerais, Brazil
CONTENTS
1.1 Objective of the book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Data-driven vs model-driven . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.2 Real-time prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.3 Building platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Outline of the book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.1 How to read this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
DOI: 10.1201/9781003148883-1 3
4 Building a Platform for Data-Driven Pandemic Prediction
explained, and their integration into a unified framework justified. This will
hopefully set the tone for the reader to understand what we did, why we did
it, and the order they are introduced in the sequel. After this description, a
summarised view of the following parts and chapters that constitute the book,
and a guide with different suggested routes on how to read it are provided.
The notation is also introduced and explained here.
the magnitude of things to come (Holmdahl and Buckee, 2020). Rather than
showing the way, the long-term predictions throw light into the way. They
might provide useful indicators when used with caution. As such, they may
constitute an important component towards a more encompassing view of the
progression of a pandemic.
Pandemics occur worldwide, i.e., over hundreds of countries. Similarly,
many epidemics occur over dozens of countries. A suitable prediction plat-
form should aim at all countries involved in the epidemic or, at the very least,
at a fair number of these countries. This brings in the unavoidable need for
considering a large number of units for prediction. Also, this kind of problem
deals with life-and-death situations. Instant updates of the prediction results
after a new data release is imperative. In the case of COVID-19 and in many
other epidemics, official data is updated daily. In the typical pandemic sce-
nario, many analyses are required at high frequency, causing a considerable
computational burden. The more elaborate and country specific the model is,
the longer it will take to fit it to the data and to generate prediction results.
The famous George Box motto of “all models are wrong but some are useful”
(Box et al., 2005) could not find a more appropriate application than this. So,
models should be carefully and parsimoniously chosen in order to include the
most important features of the pandemic, but only them. This will probably
not lead to the best prediction for every country, but hopefully will provide
useful ones for most of them. In the sequel, a specific presentation of each of
the main features of platforms for forecasting pandemics is given.
itself may prevent data from being carefully collected and computed (Funk
et al., 2019). Data may also be reviewed (Hsieh et al., 2010). In the COVID-19
pandemic for instance, repositories like the Center for Systems Science and
Engineering at Johns Hopkins University (Dong et al., 2020) displays basic
–and important– information on counts of cases worldwide; but most of the
data details mentioned above are not available.
Another important point to take into account is the evaluation of results.
As extensively discussed in Funk et al. (2019), the use of comparison met-
rics on prediction outcomes, an appropriate quantification of uncertainty, and
comparing modelling alternatives assume central roles in reaching a final goal
of providing good information.
The conclusion of this discussion is that modelling this type of data aim-
ing at (at least) providing real-time predictions requires routines that are: a)
complex enough to provide coherent results, but b) parsimonious to allow for
timely production of outcomes. This way, updated projections can be used
routinely for planning and evaluating strategies.
different backgrounds. We describe next some possible ways to read this book
depending on the type of information the reader is seeking.
First, the trivial path is to read the seven book parts in sequence, which
may serve users interested in learning about all the stages of our platform-
building process, focused on the COVID-19 pandemic forecasting. This path
is also instructive to whomever wants to replicate the entire process of our
project, with similar modelling frameworks.
Another possible reading path is geared to users interested mainly in the
methodological aspects of modelling epidemic data. For those users, we recom-
mend reading Part I for introduction and data description, and Parts II and
III for basic and further modelling aspects, respectively. Included at the end of
Part II is Chapter 6, which presents a review of Bayesian inference. This chap-
ter is left as an option for the reader who might feel the need to learn/revisit
the main concepts used within the book. The user following along this reading
path might also be interested somewhat in Part VI, where the implemented
functions from the R package to fit the proposed models are exemplified.
For users searching for instructions about how to create and maintain
an online platform for up-to-date presentation of some statistical results, we
suggest following Part I with the reading of Parts IV and V. These parts
include the step-by-step handbook of how to create an online application
with automatic data extraction and obtaining of predictions, in addition to
the discussion of some important features to monitor on inference results.
The book is finalised in Part VII, where most parts are revisited with
an introductory, concise view to summarise possible directions for the future.
This part might be of interest to all readers not satisfied just by what was
done, but also on what could come next.
1.2.2 Notation
Most pandemic data consist of counts, usually counts of infected cases or
deaths caused by the disease. The counts can be recorded separately for each
time unit or recorded by accumulation over the previous time units. The latter
is the result of integration (or sum) of the former. The usual mathematical
standard is to use capital letters for the integrated feature and small case
for the integrand. Thus, cumulative counts will be denoted by Y , while their
counts over a given time unit will be denoted by y.
Counts are collected over periods of time, that could be days, weeks,
months, etc. Whatever the time unit, the counts at time t are denoted by
yt , while the cumulative P counts up to time t are given by y1 + · · · + yt and
t
denoted by Yt , i.e., Yt = j=1 yj , for all t.
Typically, these counts are random variables with finite expectations or
means. In line with the distinction between cumulative and time-specific
counts defined above, the means are denoted by M (t) = E(Yt ) for the cu-
mulative
Pt means, and µ(t) = E(yt ) for the time-specific means, i.e., M (t) =
j=1 µ(j), for all t. The dependence on time is denoted for the means in the
Overview of the book 13
most usual functional form because their dependence on time will be made
explicit, unlike the counts that will depend on time implicitly. These points
will be made clear in Part II.
Bibliography
Box, G., Hunter, J. and Hunter, W. (2005) Statistics for Experimenters: De-
sign, Innovation, and Discovery. Wiley Series in Probability and Statistics.
Wiley. URLhttps://books.google.com.br/books?id=oYUpAQAAMAAJ.
Chang, W., Cheng, J., Allaire, J., Xie, Y. and McPherson, J. (2020) shiny:
Web Application Framework for R. URLhttps://CRAN.R-project.org/
package=shiny. R package version 1.5.0.
Chowell, G., Luo, R., Sun, K., Roosa, K., Tariq, A. and Viboud, C. (2020)
Real-time forecasting of epidemic trajectories using computational dynamic
ensembles. Epidemics, 30, 100379.
Dong, E., Du, H. and Gardner, L. (2020) An interactive web-based dashboard
to track COVID-19 in real time. The Lancet Infectious Diseases, 20, 533–
534.
Fineberg, H. V. and Wilson, M. E. (2009) Epidemic science in real time.
Science, 324, 987–987.
Funk, S., Camacho, A., Kucharski, A. J., Lowe, R., Eggo, R. M. and Edmunds,
W. J. (2019) Assessing the performance of real-time epidemic forecasts: A
case study of Ebola in the Western Area region of Sierra Leone, 2014-15.
PLOS Computational Biology, 15.
Funk, S. and King, A. A. (2020) Choices and trade-offs in inference with
infectious disease models. Epidemics, 30, 100383.
Grimm, V., Mengel, F. and Schmidt, M. (2021) Extensions of the SEIR model
for the analysis of tailored social distancing and tracing approaches to cope
with COVID-19. Scientific Reports, 11.
Held, L., Hens, N., O’Neill, P. and Wallinga, J. (eds.) (2019) Handbook of
Infectious Disease Data Analysis. Boca Raton: Chapman and Hall/CRC,
1st edn.
Hethcote, H. W. (2000) The mathematics of infectious diseases. SIAM Review,
42, 599–653.
Holmdahl, I. and Buckee, C. (2020) Wrong but useful – What Covid-19
epidemiologic models can and cannot tell us. New England Journal of
Medicine, 383, 303–305. URLhttps://doi.org/10.1056/NEJMp2016822.
14 Building a Platform for Data-Driven Pandemic Prediction
Hsieh, Y.-H. and Cheng, Y.-S. (2006) Real-time forecast of multiphase out-
break. Emerging Infectious Diseases, 12, 122–127.
Hsieh, Y.-H., Fisman, D. N. and Wu, J. (2010) On epidemic modeling in real
time: An application to the 2009 novel A (H1N1) influenza outbreak in
Canada. BMC Res Notes, 3.
Ioannidis, J. P., Cripps, S. and Tanner, M. A. (2020) Forecasting for COVID-
19 has failed. International Journal of Forecasting. URLhttp://www.
sciencedirect.com/science/article/pii/S0169207020301199.
Kermack, W. O. and McKendrick, A. G. (1927) A contribution to the math-
ematical theory of epidemics. Proceedings of the Royal Society A, 115,
700–721.
King, A. A., Nguyen, D. and Ionides, E. L. (2016) Statistical inference for
partially observed Markov processes via the R package pomp. Journal of
Statistical Software, Articles, 69, 1–43.
Martianova, A., Kuznetsova, V. and Azhmukhamedov, I. (2020) Mathemati-
cal model of the COVID-19 epidemic. In Proceedings of the Research Tech-
nologies of Pandemic Coronavirus Impact (RTCOV 2020), 63–67. Atlantis
Press.
Murray, L. M. (2015) Bayesian state-space modelling on high-performance
hardware using LibBi. Journal of Statistical Software, Articles, 67, 1–36.
Parshani, R., Carmi, S. and Havlin, S. (2010) Epidemic threshold for the
susceptible-infectious-susceptible model on random networks. Phys. Rev.
Lett., 104, 258701.
Pell, B., Kuang, Y., Viboud, C. and Chowell, G. (2018) Using phenomenolog-
ical models for forecasting the 2015 Ebola challenge. Epidemics, 22, 62–70.
The RAPIDD Ebola Forecasting Challenge.
R Core Team (2020) R: A Language and Environment for Statistical Comput-
ing. R Foundation for Statistical Computing, Vienna, Austria. URLhttps:
//www.R-project.org/.
Reich, N. G., Brooks, L. C., Fox, S. J., Kandula, S., McGowan, C. J., Moore,
E., Osthus, D., Ray, E. L., Tushar, A., Yamana, T. K., Biggerstaff, M.,
Johansson, M. A., Rosenfeld, R. and Shaman, J. (2019) A collaborative
multiyear, multimodel assessment of seasonal influenza forecasting in the
United States. Proceedings of the National Academy of Sciences, 116, 3146–
3154.
Roosa, K., Lee, Y., Luo, R., Kirpich, A., Rothenberg, R., Hyman, J., Yan,
P. and Chowell, G. (2020) Real-time forecasts of the COVID-19 epidemic
in China from February 5th to February 24th, 2020. Infectious Disease
Modelling, 5, 256–263.
Overview of the book 15
Scott, J. A., Gandy, A., Mishra, S., Unwin, J., Flaxman, S. and Bhatt, S.
(2020) epidemia: Modeling of epidemics using hierarchical Bayesian models.
URLhttps://imperialcollegelondon.github.io/epidemia/. R pack-
age version 0.7.0.
Siettos, C. I. and Russo, L. (2013) Mathematical modeling of infectious disease
dynamics. Virulence, 4, 295–306.
Dani Gamerman
Universidade Federal de Minas Gerais/Universidade Federal do Rio de
Janeiro, Brazil
Vinı́cius D. Mayrink
Universidade Federal de Minas Gerais, Brazil
Leonardo S. Bastos
Fundação Oswaldo Cruz, Brazil
CONTENTS
2.1 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Occurrence and notification times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Other relevant pandemic data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4 Data reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Data is the primary input for any statistical analysis. In this chapter we
present the main aspects of pandemic data, starting from their definitions.
Their virtues and deficiencies are described and compared. We focus on data
used for and in the predictions. Auxiliary variables that are related to the
primary data variables are also described and their relations are presented.
DOI: 10.1201/9781003148883-2 17
18 Building a Platform for Data-Driven Pandemic Prediction
involve the daily release of data collected up to that day. These are usually
referred to as confirmed cases of a disease.
Other time windows are also obtained for some pandemics. By far the sec-
ond most common time frame for data release is weekly data, especially for
diseases with very low counts. This also allows for removal of possible weekday
effects that exist in some health notification systems. Another strategy com-
monly applied in diseases with weekday variations is the use of 7-day moving
averages to smooth data counts.
The distinction between different categories of cases will be returned to
in the next section. Figure 2.1 illustrates the data mentioned above for the
COVID-19 pandemic for Switzerland.
Switzerland Switzerland
60
1000 New deaths per day
New cases per day
40
500
20
0 0
18/May/20
18/May/20
16/Mar/20
16/Mar/20
03/Feb/20
24/Feb/20
03/Feb/20
24/Feb/20
08/Jun/20
06/Apr/20
27/Apr/20
06/Apr/20
27/Apr/20
date date
FIGURE 2.1: Daily data of confirmed cases and confirmed deaths of COVID-
19 for Switzerland.
Figure 2.1 also includes the number of confirmed counts associated with
deaths caused by the disease. One striking feature of the figure is the qualita-
tive similarity of the curves of confirmed cases and confirmed deaths observed
for the same region. This similarity occurs despite the substantial difference
in nature of the two types of counts. This similarity is observed in many other
countries across the globe and will be explored by the data-driven approach
throughout the book.
In the context of a infectious disease epidemic, some individuals may be
infected without manifesting any symptom. These individuals, however, can
still transmit the agent that causes the disease. For example, a person infected
with the human immunodeficiency virus (HIV), called HIV-positive, may be
asymptomatic for several years. Without adequate treatment, a person with
HIV can develop the disease AIDS (acquired immune deficiency syndrome).
In any case, diseases manifest themselves through their symptoms. And
we just noted above that some infected individuals may be asymptomatic
in the sense they may not manifest any symptom and may carry on their
normal life totally unnoticed. Nevertheless, even though harmless to these
individuals, they may infect other people. Some of the newly infected ones
20 Building a Platform for Data-Driven Pandemic Prediction
may well present serious symptoms and the disease may even evolve to their
deaths. Thus, asymptomatic cases are just as important as the symptomatic
cases but they are typically harder to identify. If the symptoms do not show
up in an individual, it becomes very hard to identify him/her as a case. The
only way it could happen is through a widespread testing campaign in the
region of the individual. These campaigns are costly and present many logistics
difficulties.
The COVID-19 disease is a timely and important illustrative example.
Identification of cases is typically achieved through molecular tests such as
RT-PCR (reverse transcription polymerase chain reaction). These tests are
not widely available in many countries and even when they are, they may
take days to have their results released. These difficulties cause large parts of
the infected population to go unnoticed. There is still a high variability for the
proportion of asymptomatic cases among studies. Kronbichler et al. (2020) was
one of the earlier studies and indicated this proportion to be around 62%. So, a
substantial proportion of under-reporting is to be expected unless population-
wide testing is performed. Tests are also imperfect, even molecular tests are
due to error leading to false-positive and false-negative results. Imperfect tests
will be revisited in Section 7.3.2.
Data for diseases must be handled with care for the above reasons. When
there is a substantial proportion of asymptomatic cases, counts will be affected
by this feature. Therefore, it is inappropriate to refer to the obtained counts
as total counts. This issue is clarified with the nomenclature used for data
released on pandemics. Instead of referring to these measurements simply as
counts, they are named confirmed counts. This standard is widely used and
will be considered throughout this book.
Example 2.1: Let 100 cases be notified on day X. Assume that the occur-
rence dates are distributed as follows:
Occurrence day X X-1 X-2 X-3 X-4 X-5 X-6 X-7 X-8 X-9
55 15 12 8 4 1 0 2 2 1
Thus, only 55 of these cases actually occurred at day X. All the other 45
individuals became cases at an earlier day.
La Brosse, 42
La Bruyère, Jean de, 245
Laffemas, Isaac, 222, 239, 269
La Porte, Amador de, 10, 11, 14, 16, 20, 117, 120, 183, 215,
232, 279
La Porte, Charles de (see Meilleraye)
La Porte, François de, 7, 11
La Porte, Suzanne de (Madame de Richelieu), 7, 11-14, 23, 65,
75, 86
La Porte (the Queen’s valet), 268-9
Laubardemont, Baron de, 288-9
Launay-Razilly, Claude, Seigneur de, 181
Le Fèvre, 293-4
Le Jay, Nicolas, 73, 76
Le Mercier, 15, 234
Lemoine, Cardinal, 17
Leopold, Archduke, 146
Le Roy, Guyon, 3
Le Roy, Jacques, 3
Lesdiguières, François de Bonne, Duc et Maréchal de, 91, 147,
150, 158, 161
Limoges, François de la Fayette, Bishop of, 266
Lisieux, Bishop of, 293
Longueville, Anne Geneviève de Bourbon, Duchesse de, 105,
164, 246, 280
Longueville, Henry d’Orléans, Duc de, 66, 76, 86, 91, 122, 124,
137
Longueville, Mademoiselle de, 246
Lorraine, Charles, Duke of, 176, 181, 202, 218, 223-4, 249-51,
255, 278
Lorraine, Princesse Marguerite de (Duchesse d’Orléans), 224,
250
Lorraine, Nicolas François, Cardinal de, 250-51
Louis XI., King, 6, 169, 289
Louis XII., King, 4
Louis XIII., King, 32, 42, 53, 64, 68, 82-5, 94-105, 114-15, 119-
31, 133, 136-44, 146-50, 152-8, 160-80, 183-92, 194-202,
205-16, 218-22, 224-32, 236, 239, 241, 246-51, 254, 257-
60, 262-71, 273, 276-8, 282-9, 291-3, 297
Louis XIV., King, 28, 69, 142, 177, 192, 271
Louis, Saint, 34
Louvigny, Comte de, 173
Lude, François de Daillon, Comte du, 83, 120
Lusignan, Guy de, 2
Luynes, Charles d’Albert, Duc de, 83, 84, 94-8, 100-102, 105,
107-8, 112, 114-15, 117-24, 126-7, 129-30
Luynes, Marie de Rohan, Duchesse de (see Chevreuse)
Rabelais, 22
Rambouillet, Catherine de Vivonne, Marquise de, 242
Rambouillet, Charles d’Angennes, Marquis de, 214, 242
Rancé, Armand Jean de, 11
Rancé, Denys Bouthillier, Baron de, 11, 203, 210
Rapine, Florimond, 68
Ravaillac, François, 32, 52, 54
Renaudot, Théophraste, 239
Reni, Guido, 136
Retz, Abbé de (afterwards Cardinal), 264
Retz, Duc de, 125
Richelieu, Alphonse de (Archbishop of Lyons and Cardinal), 12,
16, 21, 22, 63, 86, 194, 279
Richelieu, Antoine du Plessis de (le Moine), 3, 4, 33, 43
Richelieu, Armand Jean du Plessis de (see Cardinal-Duc de
Richelieu)
Richelieu, Cardinal-Duc de: his birth, family and childhood, 1-15;
education at the University, 16-20;
training as a soldier, 21-2;
second University course, 23;
consecration as Bishop of Luçon, 24-5;
Doctor of the Sorbonne, 26;
at the Court of Henry IV., 27-30;
life and work in the diocese of Luçon, 38-46;
friendship with Père Joseph, 46-7;
Instructions et Maximes, 48-52;
visit to Paris, 55-6;
affair of Fontevrault, 57-62;
political troubles, 64-6;
speech at States-General, 69-70;
Chaplain to Queen Anne, 72-5;
Private Secretary to Marie de Médicis, 84;
death of his mother, 86;
appointed Foreign Secretary, 87;
First Ministry, 88-92;
fall from power, 97-8;
exile with the Queen-mother, 100-2;
retirement in his diocese, 103-7;
banishment to Avignon, 108-10;
recalled to the Queen-mother’s service, 114-15;
death of his brother Henry, 117;
influence with Marie de Médicis, 123;
diplomatic success, 126;
marriage of his niece, 127;
stories and intrigues, 130;
receives the Cardinal’s Hat, 131;
personal descriptions, 132-3;
purchase and decoration of country-houses, 133-6;
employment of Fancan, 137-8;
admitted to the Royal Council, 140;
First Minister of France, 142;
political aims, 143;
the English marriage, 144-6;
affair of the Valtelline, 146-8;
Huguenot Rebellion, 148-53;
negotiations with Buckingham, 155;
peace with Spain, 159;
Army and Navy, etc., 160-61;
ill health and suffering, 162;
defeat of Chalais conspiracy, 163-75;
edict against feudal strongholds, 176;
edict against duels, 177-9;
war with England, 180;
Siege of La Rochelle, 181-92;
War of Mantuan Succession, 193-7;
final defeat of Huguenots, 198-200;
offers his resignation to Louis XIII., 201;
Italian campaign, 202-6;
The King’s illness, 207-8;
the Cardinal in imminent danger, 209-14;
his triumph, 215-16;
victory over his enemies, 217-20;
new honours, 221-2;
political vengeance, 222-3;
triumph over the Duc de Montmorency, 225-31;
illness and recovery, 232-3;
palaces and châteaux, 234-8;
his household and friends, 239-42;
the Academy founded by him, 244-5;
the performance of Mirame, 246-8;
dreams of conquest realised, 249-51;
family alliances, 252;
France joins in the Thirty Years’ War, 254;
defeat and panic, 255-6;
high courage of the Cardinal, 257;
danger of assassination, 259-60;
Court intrigues, 263-7;
Richelieu’s persecution of Queen Anne, 267-70;
death of Père Joseph, 271-2;
reforms in the Church, 274-5;
disappearance of enemies, 277;
family honours, 279-80;
internal worries, 281;
ill health, 282;
enmity with Cinq-Mars, 283-4;
terrible sufferings and last will, 285;
final triumphs, 286-9;
journey back to Paris, 290;
last illness, 292-3;
death at the Palais-Cardinal, 294;
funeral at the Sorbonne, 295;
general feeling in France, 296-7;
the tomb in the Church of the Sorbonne, 298
Richelieu, François du Plessis de (le Sage), 3, 4, 33
Richelieu, François du Plessis de (Grand Provost), 1, 6-9, 10,
12, 20, 132
Richelieu, François Louis de, 109
Richelieu, Françoise de (Madame du Pont-de-Courlay), 11, 63,
279
Richelieu, Henry, Marquis de, 12, 14, 16, 31, 43, 55, 56, 86, 91,
102, 107-9, 116-17
Richelieu, Louis du Plessis de (grandfather), 4, 7, 11, 12
Richelieu, Louis du Plessis de (uncle), 6
Richelieu, Marquise de (Marguerite Guiot des Charmeaux), 109
Richelieu, Nicole de (Madame de Maillé-Brézé), 12, 86, 106-7,
215
Rivière, Abbé de la, 288
Roannez, Duchesse de, 220
Rochechouart, Antoine de, 4
Rochechouart, Françoise de (Dame de Richelieu), 4, 5, 6, 7, 11,
12, 46
Rochefoucauld, François, Cardinal de la, 116
Rocheposay, M. de la (Bishop of Poitiers), 45, 65, 66, 104
Roches, Michel Le Masle, Prieur des, 134-6, 239
Rohan, Henry, Duc de, 65, 74, 75, 122, 124, 148-52, 181, 189,
193, 198, 200, 255
Rohan, Duchesse de (Marguerite de Béthune), 151
Rohan, Vicomtesse de (Catherine de Parthenay-Soubise), 150-
51, 172, 182, 191
Rossignol, Antoine, 239
Rotrou, Jean de, 245
Rubens, Pierre-Paul, 135
Rucellai, the Abbé, 111, 116-17