Building A Platform For Data-Driven Pandemic Prediction: From Data Modelling To Visualisation - The Covidlp Project 1St Edition Dani Gamerman

Download as pdf or txt
Download as pdf or txt
You are on page 1of 64

Full download test bank at ebookmeta.

com

Building a Platform for Data-Driven Pandemic


Prediction: From Data Modelling to Visualisation -
The CovidLP Project 1st Edition Dani Gamerman
For dowload this book click LINK or Button below

https://ebookmeta.com/product/building-a-platform-
for-data-driven-pandemic-prediction-from-data-
modelling-to-visualisation-the-covidlp-
project-1st-edition-dani-gamerman/
OR CLICK BUTTON

DOWLOAD EBOOK

Download More ebooks from https://ebookmeta.com


More products digital (pdf, epub, mobi) instant
download maybe you interests ...

From Statistical Physics to Data-Driven Modelling


Simona Cocco

https://ebookmeta.com/product/from-statistical-physics-to-data-
driven-modelling-simona-cocco/

Designing Data Governance from the Ground Up Six Steps


to Build a Data Driven Culture 1st Edition Lauren
Maffeo

https://ebookmeta.com/product/designing-data-governance-from-the-
ground-up-six-steps-to-build-a-data-driven-culture-1st-edition-
lauren-maffeo/

Identification of Pathogenic Social Media Accounts:


From Data to Intelligence to Prediction Alvari

https://ebookmeta.com/product/identification-of-pathogenic-
social-media-accounts-from-data-to-intelligence-to-prediction-
alvari/

Data Journalism A Story Driven Approach to Learning


Data Reporting Mike Reilley

https://ebookmeta.com/product/data-journalism-a-story-driven-
approach-to-learning-data-reporting-mike-reilley/
Data-Driven Alexa Skills: Voice Access to Rich Data
Sources for Enterprise Applications 1st Edition Simon
A. Kingaby

https://ebookmeta.com/product/data-driven-alexa-skills-voice-
access-to-rich-data-sources-for-enterprise-applications-1st-
edition-simon-a-kingaby/

Building an Event-Driven Data Mesh (Early Release) Adam


Bellemare

https://ebookmeta.com/product/building-an-event-driven-data-mesh-
early-release-adam-bellemare/

Data-driven Analytics for Sustainable Buildings and


Cities: From Theory to Application 1st Edition Xingxing
Zhang

https://ebookmeta.com/product/data-driven-analytics-for-
sustainable-buildings-and-cities-from-theory-to-application-1st-
edition-xingxing-zhang/

Beginning Data Science in R 4: Data Analysis,


Visualization, and Modelling for the Data Scientist 2nd
Edition Thomas Mailund

https://ebookmeta.com/product/beginning-data-science-in-r-4-data-
analysis-visualization-and-modelling-for-the-data-scientist-2nd-
edition-thomas-mailund/

Statistics and Data Visualisation with Python Jesús


Rogel-Salazar

https://ebookmeta.com/product/statistics-and-data-visualisation-
with-python-jesus-rogel-salazar/
Building a Platform for
Data-Driven Pandemic
Prediction
Building a Platform for
Data-Driven Pandemic
Prediction
From Data Modelling to
Visualisation - The CovidLP
Project

Edited by
Dani Gamerman
Marcos O. Prates
Thaís Paiva
Vinícius D. Mayrink
First edition published 2022
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742

and by CRC Press


2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN

© 2022 selection and editorial matter, Dani Gamerman, Marcos O. Prates, Thaís Paiva, Vinícius D.
Mayrink; individual chapters, the contributors

CRC Press is an imprint of Taylor & Francis Group, LLC

Reasonable efforts have been made to publish reliable data and information, but the author and pub-
lisher cannot assume responsibility for the validity of all materials or the consequences of their use.
The authors and publishers have attempted to trace the copyright holders of all material reproduced
in this publication and apologize to copyright holders if permission to publish in this form has not
been obtained. If any copyright material has not been acknowledged please write and let us know so
we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, access www.copyright.
com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA
01923, 978-750-8400. For works that are not available on CCC please contact mpkbookspermis-
[email protected]

Trademark notice: Product or corporate names may be trademarks or registered trademarks and
are used only for identification and explanation without intent to infringe.

Library of Congress Cataloging‑in‑Publication Data

ISBN: 978-0-367-70999-0 (hbk)


ISBN: 978-0-367-70997-6 (pbk)
ISBN: 978-1-003-14888-3 (ebk)

DOI: 10.1201/9781003148883

Typeset in [font]
by KnowledgeWorks Global Ltd.
To Science
Contents

Preface xiii

Contributors xvii

I Introduction 1
1 Overview of the book 3
Dani Gamerman, Thaı́s Paiva, Guido A. Moreira, and Juliana Freitas
1.1 Objective of the book . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Data-driven vs model-driven . . . . . . . . . . . . . . 5
1.1.2 Real-time prediction . . . . . . . . . . . . . . . . . . . 7
1.1.3 Building platforms . . . . . . . . . . . . . . . . . . . . 8
1.2 Outline of the book . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.1 How to read this book . . . . . . . . . . . . . . . . . . 11
1.2.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 Pandemic data 17
Dani Gamerman, Vinı́cius D. Mayrink, and Leonardo S. Bastos
2.1 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Occurrence and notification times . . . . . . . . . . . . . . . 20
2.3 Other relevant pandemic data . . . . . . . . . . . . . . . . . 22
2.4 Data reconstruction . . . . . . . . . . . . . . . . . . . . . . . 24

II Modelling 31
3 Basic epidemiological features 33
Dani Gamerman, Juliana Freitas, and Leonardo Nascimento
3.1 Introduction and main ideas . . . . . . . . . . . . . . . . . . 33
3.2 Model extensions . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3 Properties of epidemiological models . . . . . . . . . . . . . . 40
3.4 Are these models appropriate? . . . . . . . . . . . . . . . . . 45

4 Data distributions 53
Guido A. Moreira, Juliana Freitas, and Dani Gamerman
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2 The Poisson distribution . . . . . . . . . . . . . . . . . . . . 57
4.3 Overdispersion . . . . . . . . . . . . . . . . . . . . . . . . . . 60

vii
viii Contents

4.3.1 Negative Binomial: Mean-dependent overdispersion . . 62


4.3.2 Negative Binomial: Mean-independent overdispersion . 64
4.3.3 Other models for the overdispersion . . . . . . . . . . 65
4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.4.1 Daily new cases vs cumulative cases . . . . . . . . . . 69
4.4.2 Parameter truncation . . . . . . . . . . . . . . . . . . 70

5 Modelling specific data features 75


Guido A. Moreira, Juliana Freitas, Leonardo Nascimento, and Ricardo C.
Pedroso
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.2 Heterogeneity across sub-regions . . . . . . . . . . . . . . . . 76
5.3 Seasonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.4 Multiple waves . . . . . . . . . . . . . . . . . . . . . . . . . . 85

6 Review of Bayesian inference 93


Guido A. Moreira, Ricardo C. Pedroso, and Dani Gamerman
6.1 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.1.1 Bayesian inference . . . . . . . . . . . . . . . . . . . . 95
6.1.2 Prior distribution . . . . . . . . . . . . . . . . . . . . . 96
6.1.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . 98
6.1.4 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.2 Operationalisation . . . . . . . . . . . . . . . . . . . . . . . . 102
6.2.1 The quadrature technique . . . . . . . . . . . . . . . . 103
6.2.2 Markov Chain Monte Carlo simulation methods . . . . 105

III Further Modelling 111


7 Modelling misreported data 113
Leonardo S. Bastos, Luiz M. Carvalho, and Marcelo F.C. Gomes
7.1 Issues with the reporting of epidemiological data . . . . . . . 114
7.2 Modelling reporting delays . . . . . . . . . . . . . . . . . . . 117
7.2.1 Weekly cases nowcast . . . . . . . . . . . . . . . . . . 119
7.2.2 Illustration: SARI notifications in Brazil provided by
InfoGripe . . . . . . . . . . . . . . . . . . . . . . . . . 122
7.3 Prevalence estimation from imperfect tests . . . . . . . . . . 125
7.3.1 Preliminaries: Imperfect classifiers . . . . . . . . . . . 125
7.3.2 Prevalence from a single imperfect test . . . . . . . . . 126
7.3.3 Re-testing positives . . . . . . . . . . . . . . . . . . . 129
7.3.4 Estimating underreporting from prevalence surveys . . 130
7.3.5 Illustration: COVID-19 prevalence in Rio de Janeiro . 131
7.3.6 Model extensions . . . . . . . . . . . . . . . . . . . . . 133
7.3.7 Open problems . . . . . . . . . . . . . . . . . . . . . . 135
Contents ix

8 Hierarchical modelling 141


Dani Gamerman and Marcos O. Prates
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
8.2 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
8.3 Dynamic models . . . . . . . . . . . . . . . . . . . . . . . . . 145
8.4 Hierarchical models . . . . . . . . . . . . . . . . . . . . . . . 149
8.4.1 Unstructured component . . . . . . . . . . . . . . . . 150
8.5 Spatial models . . . . . . . . . . . . . . . . . . . . . . . . . . 152
8.5.1 Spatial component . . . . . . . . . . . . . . . . . . . . 154

IV Implementation 159
9 Data extraction/ETL 161
Marcos O. Prates, Ricardo C. Pedroso, and Thaı́s Paiva
9.1 Data sources . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
9.2 Data preparation . . . . . . . . . . . . . . . . . . . . . . . . 163
9.3 Additional reading . . . . . . . . . . . . . . . . . . . . . . . . 167

10 Automating modelling and inference 169


Marcos O. Prates, Thais P. Menezes, Ricardo C. Pedroso, and Thaı́s
Paiva
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
10.2 Implementing country models . . . . . . . . . . . . . . . . . 172
10.3 Implementing Brazilian models . . . . . . . . . . . . . . . . . 180

11 Building an interactive app with Shiny 189


Thaı́s Paiva, Douglas R. M. Azevedo, and Marcos O. Prates
11.1 Getting started . . . . . . . . . . . . . . . . . . . . . . . . . . 190
11.2 Shiny basics . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
11.3 Beyond Shiny basics . . . . . . . . . . . . . . . . . . . . . . 196
11.4 Code organisation . . . . . . . . . . . . . . . . . . . . . . . . 197
11.5 Design of the user interface . . . . . . . . . . . . . . . . . . . 198
11.5.1 The CovidLP app structure . . . . . . . . . . . . . . . 203
11.6 Creating interactive plots . . . . . . . . . . . . . . . . . . . . 206
11.6.1 plotly basics . . . . . . . . . . . . . . . . . . . . . . . 206
11.6.2 The CovidLP plots . . . . . . . . . . . . . . . . . . . . 208
11.7 Deploy and publish . . . . . . . . . . . . . . . . . . . . . . . 209
11.8 Monitoring usage . . . . . . . . . . . . . . . . . . . . . . . . 210

V Monitoring 215
12 Daily evaluation of the updated data 217
Vinı́cius D. Mayrink, Juliana Freitas, Ana Julia A. Câmara, Gabriel O.
Assunção, and Jonathan S. Matias
12.1 The importance of monitoring the data . . . . . . . . . . . . 218
12.2 Atypical observations . . . . . . . . . . . . . . . . . . . . . . 220
x Contents

12.3 Detecting multiple waves . . . . . . . . . . . . . . . . . . . . 222


12.4 Seasonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

13 Investigating inference results 233


Vinı́cius D. Mayrink, Juliana Freitas, Ana Julia A. Câmara, Gabriel O.
Assunção, and Jonathan S. Matias
13.1 Monitoring inference issues . . . . . . . . . . . . . . . . . . . 234
13.2 Monitoring and learning . . . . . . . . . . . . . . . . . . . . 235
13.3 Evaluation metrics . . . . . . . . . . . . . . . . . . . . . . . . 237
13.4 Practical situations . . . . . . . . . . . . . . . . . . . . . . . 239
13.4.1 Overall comparison . . . . . . . . . . . . . . . . . . . . 240
13.4.2 Seasonality . . . . . . . . . . . . . . . . . . . . . . . . 244
13.4.3 Multiple waves . . . . . . . . . . . . . . . . . . . . . . 249

14 Comparing predictions 257


Vinı́cius D. Mayrink, Ana Julia A. Câmara, Jonathan S. Matias, Gabriel
O. Assunção, and Juliana Freitas
14.1 The structure of the proposed comparison . . . . . . . . . . . 258
14.2 Analysis for Brazilian states . . . . . . . . . . . . . . . . . . 260
14.3 Analysis for countries . . . . . . . . . . . . . . . . . . . . . . 266
14.4 Improvements from a two-waves modelling . . . . . . . . . . 271

VI Software 275
15 PandemicLP package: Basic functionalities 277
Marcos O. Prates, Guido A. Moreira, Marta Cristina C. Bianchi, Débora
F. Magalhães, and Thais P. Menezes
15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
15.2 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
15.2.1 Installing from the GitHub repository . . . . . . . . . 279
15.3 Functionalities . . . . . . . . . . . . . . . . . . . . . . . . . . 280
15.3.1 COVID-19 data extraction and loading: load covid . 281
15.3.2 Visualising the data: plot.pandemicData . . . . . . . 281
15.3.3 Model fitting: pandemic model . . . . . . . . . . . . . 282
15.3.4 Predictive distribution:
posterior predict.pandemicEstimated . . . . . . . 283
15.3.5 Calculating relevant statistics: pandemic stats . . . 285
15.3.6 Plotting the results: plot.pandemicPredicted . . . . 286
15.4 Modelling with the PandemicLP . . . . . . . . . . . . . . . . 287
15.4.1 Generalised logistic model . . . . . . . . . . . . . . . . 288
15.4.2 Generalised logistic model with seasonal effect . . . . 291
15.4.3 Two-wave model . . . . . . . . . . . . . . . . . . . . . 298
15.5 Sum of regions . . . . . . . . . . . . . . . . . . . . . . . . . . 299
15.6 Working with user data . . . . . . . . . . . . . . . . . . . . . 304
Contents xi

16 Advanced settings: The pandemic model function 315


Marcos O. Prates, Guido A. Moreira, Marta Cristina C. Bianchi, Débora
F. Magalhães, and Thais P. Menezes
16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
16.2 Solving sampling issues . . . . . . . . . . . . . . . . . . . . . 318
16.3 Sampling diagnostics . . . . . . . . . . . . . . . . . . . . . . 323
16.4 Truncation of the total number of cases . . . . . . . . . . . . 327

VII Conclusion 333


17 Future directions 335
The CovidLP Team
17.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
17.2 Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
17.2.1 Overdispersion . . . . . . . . . . . . . . . . . . . . . . 336
17.2.2 Relation between cases and deaths . . . . . . . . . . . 336
17.2.3 Automated identification of wave changes . . . . . . . 337
17.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 337
17.4 Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
17.5 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340

Index 343
Preface

The project that led to this book started in March 2020. One of us was just
starting his graduate course classes on Dynamic models when the COVID-19
pandemic started in Brazil, forcing the suspension of presential classes. During
the first months of the pandemic, universities in Brazil were unsure of how to
proceed. Our university recommended that faculty should not continue classes
even in on-line mode and should resort only to challenges to the students and
basic exercises.
About that time, an intense virtual debate started among groups of faculty.
This pattern was observed in our Institute of Exact Sciences, which consists of
the departments of Statistics, Mathematics, Computing, Physics and Chem-
istry. New messages containing solutions to various problems relating to the
pandemic appeared every single day from members of all the departments.
One of the messages contained a data-driven proposal based on the lo-
gistic curve, with parameter estimation and prediction of the counts of new
cases until the pandemic ends. The message was written by a physicist and
contained an abridged version of a paper. This manuscript was the spark that
was missing to decide on how to entertain the students. After all, statisticians
should be able to handle at least the task presented by the Physics colleague.
The project was presented to the graduate students, who were very keen
on embracing the exercise and started working straight away on the project.
Their results started to appear and problems started to emerge. They were
discussed at regular meetings held at class time. After all, this was a challenge
for the student and this was allowed by the university rules to replace the
missing classes!
Our results were regularly passed on informally to our departmental col-
leagues. They always pointed to the need to inform the general public about
what the project was providing. This issue led to the need to build an appro-
priate platform for the release of the project information. One of us was drawn
to the project at this point. The preparation of the platform for releasing the
results led to the CovidLP app.
Round about the same time, the Ministry of Education opened an ur-
gent call for proposals on different research aspects of the pandemic. Our
project was submitted and subsequently approved: 2 post-doc grants and 2
workstations were dedicated to the project, giving the project the respective
methodological and the computational amplitudes it so badly needed. Two
other faculty and a former graduate student were also included. Our project

xiii
xiv Preface

gained the scalability it required. The CovidLP project was created. The name
was chosen to emphasise the interest of the project on long-term prediction.
A number of issues of all sorts appeared and were dealt with in the best
possible manner. These issues ranged from installing the workstations man-
ually and remotely in a deserted campus to addressing the methodological
difficulties and testing the proposed solutions. Many hours were spent study-
ing the literature and testing different approaches.
The CovidLP project gained national visibility after a series of work-
shop and seminar presentations, media interviews and news releases, and our
methodology was adopted by a few health officials in different administrative
levels in Brazil. By then, the project also contained a site and a blog to inform
the general public about the changes that were being introduced and discuss
them. It became obvious that another stream worth pursuing was software
for reproducing the analyses in a more general setting, suited for more expe-
rienced data analysts. It also became clear that the project was not restricted
to the COVID-19 pandemic. Thus the software was named PandemicLP, to
signal this change in scope.
After a few months, the project was mature. The participants were organ-
ised in focal groups and the production of information became more effective.
The project was getting ready for scientific publication. It became clear to us
that what we had developed thus far was worth reporting.
But it was clear to us that our story was not of Statistics. There are
better books already available to describe statistical aspects of epidemics. It
was also clear that our story was not of Computing. Again, there are better
books about the capabilities now widely available in software. The story of
the CovidLP project is a tale about the inseparable roles played by Statistics
and Computing for building such platforms for daily release of the results of
statistical analyses. This is the story that we want to tell.
By that time, CRC released a call to the scientific community for proposals
in general, including books, about the pandemic. It seemed to us the perfect
match, and after revising the proposal to include very thoughtful comments
from reviewers, whom we thank, the proposal was accepted. A major con-
tribution from this revision process was the recommendation for inclusion of
more epidemiological background on data and models. This was accepted and
this addition was provided by colleagues outside (but aware of) our project,
providing an important complement to our work. Our final proposal contains
the key elements of our task: data description, statistical modelling and mon-
itoring, computational implementation and software.
We worked very hard during the 9 months that elapsed between then and
now and are very pleased with our end result. Of course, this is way too short.
The main intention of the book is to provide data analysts with the tools
required for building platforms for statistical analyses and predictions in epi-
demiological contexts. A secondary goal is to allow users to make adaptations
in the book structure to guide them into an online platform solution to their
own data analysis problem, which may not even be related to Epidemiology.
Preface xv

We would like to finish by thanking the people who accompanied the de-
velopment of the CovidLP project. These include the users of our app, the
attendees at the talks we delivered, our academic colleagues who provided
useful inputs to the project and friends and families that provided us support
for achievement of this task. A very special thank you goes to the CovidLP
team, a group of dedicated students and post-docs that embarked on the jour-
ney that led to this book, reading countless papers and books, implementing
a number of computational codes and participating actively in the all steps
required for the completion of this book. Thanks are also due to Ricardo
Pedroso, for the book cover figure. A warm acknowledgement goes to Rob
Calver for his continued support and relentless effort to make this book possi-
ble in the best possible shape. We also thank the CRC team, especially Rob,
Michele and Vaishali, for the administrative support. They also arranged for
one text editor and 2 experts to review the entire book. Their reviews, which
we gratefully thank, provided more context and breadth to the book content.
In Chapter 1, the book provides different paths to follow in order to help
its readers achieving their own, different goals. If the book manages to help
readers to attain their platform building goals, then the book would have
achieved its goals.

DG, MOP, TP and VDM

Belo Horizonte, 31 May 2021.


Contributors

The CovidLP Team


Universidade Federal de Minas Gerais
Belo Horizonte, Brazil
The team has the following members: Gabriel O. Assunção, Douglas R. M.
Azevedo, Ana Julia A. Câmara, Marta Cristina C. Bianchi, Juliana Freitas,
Dani Gamerman, Débora F. Magalhães, Jonathan S. Matias, Vinı́cius D.
Mayrink, Thais P. Menezes, Guido A. Moreira, Leonardo Nascimento, Thaı́s
Paiva, Ricardo C. Pedroso and Marcos O. Prates

Gabriel O. Assunção Juliana Freitas


Universidade Federal de Minas Universidade Federal de Minas
Gerais Gerais
Belo Horizonte, Brazil Belo Horizonte, Brazil

Douglas R. M. Azevedo Dani Gamerman


Localiza S.A. Universidade Federal de Minas
Belo Horizonte, Brazil Gerais/Universidade Federal do
Rio de Janeiro
Leonardo S. Bastos Belo Horizonte/Rio de Janeiro,
Fundação Oswaldo Cruz Brazil
Rio de Janeiro, Brazil
Marcelo F. C. Gomes
Ana Julia A. Camara Fundação Oswaldo Cruz
Universidade Federal de Minas Rio de Janeiro, Brazil
Gerais
Belo Horizonte, Brazil Debora F. Magalhães
Luiz M. Carvalho Universidade Federal de Minas
Gerais
Fundação Getulio Vargas
Rio de Janeiro, Brazil Belo Horizonte, Brazil

Marta Cristina C. Bianchi Jonathan S. Matias


Universidade Federal de Minas Universidade Federal de Minas
Gerais Gerais
Belo Horizonte, Brazil Belo Horizonte, Brazil

xvii
xviii Contributors

Vinı́cius D. Mayrink Thaı́s Paiva


Universidade Federal de Minas Universidade Federal de Minas
Gerais Gerais
Belo Horizonte, Brazil Belo Horizonte, Brazil
Thais P. Menezes
University College Ricardo C. Pedroso
Dublin, Ireland Universidade Federal de Minas
Gerais
Guido A. Moreira
Universidade Federal de Minas Belo Horizonte, Brazil
Gerais
Belo Horizonte, Brazil Marcos O. Prates
Leonardo Nascimento Universidade Federal de Minas
Universidade Federal do Amazonas Gerais
Manaus, Brazil Belo Horizonte, Brazil
Part I

Introduction
1
Overview of the book

Dani Gamerman
Universidade Federal de Minas Gerais/Universidade Federal do Rio de
Janeiro, Brazil

Thaı́s Paiva
Universidade Federal de Minas Gerais, Brazil

Guido A. Moreira
Universidade Federal de Minas Gerais, Brazil

Juliana Freitas
Universidade Federal de Minas Gerais, Brazil

CONTENTS
1.1 Objective of the book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Data-driven vs model-driven . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.2 Real-time prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.3 Building platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Outline of the book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.1 How to read this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

This is a book about building platforms for pandemic prediction. In order to


predict pandemics and epidemics, it is necessary to develop an inferential sys-
tem typically based on Statistics. In order to build platforms, it is necessary
to develop tools typically based on Computing. Both parts are important in
the development of a platform and will be treated with equal importance. The
book is structured in parts that will handle each component of this construc-
tion process, and describes their integration. This structure aims to benefit
readers interested in building platforms and/or pandemic prediction. The fi-
nal part of the book title refers to the project that served as the basis for the
realisation of this book.
In this chapter, the main ideas that guided the preparation of the book are
presented. All concepts that are included in the admittedly long book title are

DOI: 10.1201/9781003148883-1 3
4 Building a Platform for Data-Driven Pandemic Prediction

explained, and their integration into a unified framework justified. This will
hopefully set the tone for the reader to understand what we did, why we did
it, and the order they are introduced in the sequel. After this description, a
summarised view of the following parts and chapters that constitute the book,
and a guide with different suggested routes on how to read it are provided.
The notation is also introduced and explained here.

1.1 Objective of the book


This book is about platforms for pandemic prediction. Therefore we must
describe first what is meant here by a platform. Loosely speaking, it is any
device that enables users to obtain information about a given matter. In our
case, the matter of interest is pandemic prediction. This device usually comes
in the form of an online application widely available on computers, notebooks,
tablets, mobile phones and any other equipment with internet access. Online
applications abound at this epoch over an immense variety of formats and
purposes, and are usually referred to as apps. A more formal definition and a
detailed description of platforms are provided in the sequel.
The book will go through the various steps involved in the preparation of
such apps, with the purpose of prediction of relevant features of pandemics
and epidemics. By prediction, we mean to provide a full description of the
uncertainty associated with such a task. This would be the predictive distri-
bution of the features of interest in the future, under the Bayesian paradigm
for Statistics. The entire description of this object is the subject of later chap-
ters. For now, it suffices to say that inference under this paradigm is entirely
model-based and forecasts are always probabilistic. This means that infer-
ence relies entirely upon the construction of the probabilistic specification for
observations and other quantities present in the model.
Prediction may be split into the categories short term and long term in
many applications. There are no mathematical definitions for these terms,
but they roughly refer to the immediate future and the distant future. This
distinction is particularly pertinent in the context of pandemics. Short term
for a pandemic means, in general, the immediate future of up to two weeks.
Long term refers to the end of the pandemic or of its first wave, understood
here as the first large mass of occurrences of the disease. This book addresses
predictions for both terms. Both categories have their merits. The nature
of the course of most epidemics through a human population is inevitably
subject to change; conditions that were held in the past may no longer be
valid after a while. Thus, short-term predictions are easier to perform well.
The failure to provide good results for the long run have led to many crit-
icisms by the general public but also in the scientific community (Ioannidis
et al., 2020). Nevertheless, long-term prediction provides a useful indication of
Overview of the book 5

the magnitude of things to come (Holmdahl and Buckee, 2020). Rather than
showing the way, the long-term predictions throw light into the way. They
might provide useful indicators when used with caution. As such, they may
constitute an important component towards a more encompassing view of the
progression of a pandemic.
Pandemics occur worldwide, i.e., over hundreds of countries. Similarly,
many epidemics occur over dozens of countries. A suitable prediction plat-
form should aim at all countries involved in the epidemic or, at the very least,
at a fair number of these countries. This brings in the unavoidable need for
considering a large number of units for prediction. Also, this kind of problem
deals with life-and-death situations. Instant updates of the prediction results
after a new data release is imperative. In the case of COVID-19 and in many
other epidemics, official data is updated daily. In the typical pandemic sce-
nario, many analyses are required at high frequency, causing a considerable
computational burden. The more elaborate and country specific the model is,
the longer it will take to fit it to the data and to generate prediction results.
The famous George Box motto of “all models are wrong but some are useful”
(Box et al., 2005) could not find a more appropriate application than this. So,
models should be carefully and parsimoniously chosen in order to include the
most important features of the pandemic, but only them. This will probably
not lead to the best prediction for every country, but hopefully will provide
useful ones for most of them. In the sequel, a specific presentation of each of
the main features of platforms for forecasting pandemics is given.

1.1.1 Data-driven vs model-driven


A relevant aspect of this book is the construction of the model for the obser-
vations based on the apparent features of the data. We refer to this approach
as data-driven. This can be contrasted against model-driven approaches, that
use biological/physical considerations to drive the model specification. This
distinction is more didactic than practical, as there is frequently an interplay
between the two sources of information. For example, there are many possi-
ble data-driven alternatives for model specification. They are mostly similar
in qualitative terms. The ones highlighted in this book were based on some
basic theoretical consideration. So there is a mild presence of theory in this
choice. Therefore, purists might say that our approach is not data-driven or
model-driven.
The data-driven approach fits in nicely with the pragmatic approach ad-
vocated in this book for real-time predictions. Having a parsimonious model
as the baseline enables the incorporation of data features that may not be
present at the onset of a pandemic, but may emerge at a later stage, as is
the case with many pandemics, including COVID-19. The model needs to be
enlarged in complexity to accommodate these features. But the computing
time spared through the use of a simple starting model makes it feasible to
allow for the inclusion of these additional features.
6 Building a Platform for Data-Driven Pandemic Prediction

It must be acknowledged, however, that there are many studies of pan-


demics that are based on the model-driven approach. This literature, mostly
based on the so-called compartmental models, known also as SIR and its ex-
tensions, has been considerably enlarged with the COVID-19 pandemic. The
idea behind these models is to allocate the population into different com-
partments and to describe the dynamics of the population transitioning be-
tween them. The most basic version has three compartments, (S)usceptible,
(I)nfectious and (R)ecovered.
The data-driven approach of this book is distinct from the various ver-
sions of compartmental models, even when the latter uses data to estimate
their parameters. For this purpose, the basic features of the compartmental
models are briefly described so that similarities and differences are pointed
out, without getting into technical details.
The many versions of compartmental models have a few similar assump-
tions. They assume that the population under study is closed, that is, it does
not suffer external influence. They also assume that the population is per-
fectly mixed, in the sense that every individual has the same (non-)contact
pattern with every other individual. An additional assumption is that ev-
ery susceptible person has the same probability of being infected. But there
have been proposals trying to weaken these assumptions, see Grimm et al.
(2021) for example. Then, the individuals are assigned to different compart-
ments and the transition dynamics are described through ordinary differential
equations. While the original SIR model was proposed in Kermack and McK-
endrick (1927), there are many extensions which add compartments such as
(D)eceased (Parshani et al., 2010), (M)aternally derived immunity, (E)xposed
(Hethcote, 2000), and others, as well as combinations thereof (Martianova
et al., 2020).
One of the features of compartmental models is that they are determin-
istic, which means that, for a given parameter value, the dynamics between
the compartments are completely defined and the progress of the disease is
set. This is unrealistic as this ignores many uncertainties not covered in the
differential equations, such as delayed case reporting, imperfect infection date
recording and many others. To account for that, a probabilistic model can
be added to aggregate uncertainty. Modern tools exist to estimate model pa-
rameters in this case, such as the pomp R package (King et al., 2016) and the
LibBi library (Murray, 2015). Their use has been compared in Funk and King
(2020).
Similar to the methodology presented in this book, compartmental models
may use observed data to guide the choice of values for the parameters. The
fundamental difference between any compartmental model and the method-
ology adopted here is that the former is built based on a description of the
physical process behind the data, to which uncertainty could be added through
a probabilistic model. The choice of how many and which compartments to
add to the model is a major factor in the description of the physical process, fit
to the data and parameters interpretation. In contrast, the proposal discussed
Overview of the book 7

throughout this book, although initially inspired by a growth model (details


in Chapter 3), has its features and extensions entirely motivated by its data
feature adequacy. Under no circumstances do the models presented in this
book attempt to describe the physical process behind an epidemic or pan-
demic. This is an important distinction that warrants the data-driven ‘stamp’
as it is entirely and exclusively motivated by the features found in the data.
There are many other methods of mathematical modelling of epidemics,
such as statistical regression techniques, complex networked models, web-
based data-mining, and surveillance networks, among others. It is not the
intention of this book to describe them, but a short summary can be found in
Siettos and Russo (2013). A more in-depth and complete review can be seen
in Held et al. (2019). An interesting model-based approach recently developed
during the COVID-19 pandemic is implemented in the epidemia R package
and can be read in detail in Scott et al. (2020).

1.1.2 Real-time prediction


In the emergence of an outbreak of a novel disease, or even when cases of a
known malady start presenting an increasing pattern, projections of the most
probable scenarios become remarkably important as countless fundamental
decision-making processes depend on what to expect from the future (Fineberg
and Wilson, 2009; Pell et al., 2018; Funk et al., 2019; Reich et al., 2019). The
mentioned projections may concern what can happen in both short-term (days
or weeks ahead) and long-term (months or the entire duration) periods. As
mentioned in Funk et al. (2019), Viboud and Vespignani (2019), Roosa et al.
(2020), and Chowell et al. (2020), short-term projections may serve as a guid-
ance for authorities concerning prompt decisions such as organising hospital
beds and individual protection equipment. In turn, long-term ones are related
to a solid preparation of health care systems and vaccination programs, for
example. However, the efforts concerning the mathematics behind these pro-
jections of future scenarios are not an easy task (Hsieh et al., 2010; Tariq et al.,
2020). Viboud and Vespignani (2019) compare predicting cases of epidemics
with the challenges involving weather forecasting, and competitions world-
wide aim to encourage the improvement of models (see Pell et al., 2018). For
instance, it can be the case that several factors (e.g., lockdown, quarantine,
vaccination, among others) impose a rapidly change of the situation of the
epidemic curve. Thus, as reinforced in Hsieh and Cheng (2006) and Chowell
et al. (2020), projections may be obsolete in a short period of time, invali-
dating its practical usage. Consequently, the constant update of results, i.e.,
real-time predictions, may be even more crucial in such situations.
Real-time predictions depend on a constant update of data. This rigorous
routine may impose a limitation on modelling approaches due to the lack of
case-specific information such as gender, age, and disease-related dates, to
cite a few (Pell et al., 2018). In addition to that, in extreme situations like a
pandemic, data may have poor consistency since the gravity of the situation
8 Building a Platform for Data-Driven Pandemic Prediction

itself may prevent data from being carefully collected and computed (Funk
et al., 2019). Data may also be reviewed (Hsieh et al., 2010). In the COVID-19
pandemic for instance, repositories like the Center for Systems Science and
Engineering at Johns Hopkins University (Dong et al., 2020) displays basic
–and important– information on counts of cases worldwide; but most of the
data details mentioned above are not available.
Another important point to take into account is the evaluation of results.
As extensively discussed in Funk et al. (2019), the use of comparison met-
rics on prediction outcomes, an appropriate quantification of uncertainty, and
comparing modelling alternatives assume central roles in reaching a final goal
of providing good information.
The conclusion of this discussion is that modelling this type of data aim-
ing at (at least) providing real-time predictions requires routines that are: a)
complex enough to provide coherent results, but b) parsimonious to allow for
timely production of outcomes. This way, updated projections can be used
routinely for planning and evaluating strategies.

1.1.3 Building platforms


After obtaining the predictions, the next step is to present the results in the
best possible format. There are cases where these results are only required
for utilisation by a single person, a small group of people or an individual
institution. In these cases, there might not be any need for a structured plat-
form. In most cases, however, an application considering all the information
that needs to be conveyed, as well as the different users’ profiles, must be
constructed. The recipient public can range from other statisticians and aca-
demic researchers, government officials, to journalists and the general public,
accessing the platform from all over the world. The application should allow
them to visualise the most recent data, in addition to the updated prediction
results for a set of different input options. When choosing how to display the
data, it is important to consider the adequacy to different devices that can be
used to access the application, as well as the variety of people’s backgrounds
and their graph literacy.
The plots’ aesthetics are crucial to clearly display the observed values, the
predictions and their uncertainty, and any other relevant model components.
Interactive plots can help to select regions to zoom into, identify specific values
and choose which data series to show. Another feature that can be beneficial
for public use is the option to download the plots and data files. These are some
of the advantages of publishing the results in an online interactive application.
Some examples of nice and useful platforms that provide a great amount
of data displayed in various formats can be seen in the website of the Our
World in Data organisation (ourworldindata.org), that among other top-
ics, has a designated section for visualisation of data about the COVID-19
pandemic. The Institute for Health Metrics and Evaluation (IHME) also
has a variety of applications designed to disclose global health research
Overview of the book 9

results in different formats. It is worth highlighting their platform with


worldwide projections related to the COVID-19 pandemic available at
covid19.healthdata.org/projections. These are just some examples of the
diversity of these data visualisation platforms. There are actually no limits on
how complex the platform functionalities can be. This will depend mostly on
the availability of human and computational resources, but it is also related
to the objective of each project.
Our choice for the language and tools to build such an application was
R (R Core Team, 2020) and the Shiny package (Chang et al., 2020), since
our team consisted mainly of statisticians. This package allowed us to easily
create and publish a user interface to select the country/region of choice and
readily see the latest data and predictions. The user interaction with the plots
was also available thanks to the plotly graphing library (Sievert, 2020). We
believe that these might be the choices of other researchers as well, such as
statisticians and epidemiologists. They will probably follow similar paths in
the process of publishing some model fit results. That is why the main steps
and issues of our process in building such a platform are compiled in this book,
in the hope that we can help others in their own journeys.
It should be noted that the application can be part of a broader integrated
platform according to the projects’ needs. In the case of the CovidLP Project,
a site to include more methodological details, announcements and updates
related to the project, and a channel for user interaction was also developed.
Besides this, all the source code and latest results are openly available in an
online repository, enabling collaboration among more advanced users. Lastly,
the code used to fit the model and obtain the predictions was turned into an
R package to facilitate replication of the methodology to other data sets.
Either way, the book succinctly covers all the material (models, platform
and software) presented above. But the development of the platform, our
assumed goal, goes way beyond model specification and its ensuing inference.
It also goes beyond the construction of a computing software.
Some of the following chapters elucidate the procedures needed to con-
struct such an integrated platform, starting from data extraction, model se-
lection and fit, to publishing the prediction results. A concern common to
all these steps is the automation of processes, especially when dealing with
dynamic data sets such as the ones for real-time pandemic prediction. It is
important to think about how to automate the whole data-driven process
when wanting to provide the most up-to-date predictions for a large number
of countries and regions. This scenario is determinant for many of the decisions
taken in this book. These issues will be hopefully clarified in the respective
parts of the book, to be described in the next section.
10 Building a Platform for Data-Driven Pandemic Prediction

1.2 Outline of the book


This book is divided into seven parts, each one of them with a number of
chapters. Part I contains 2 chapters. Chapter 1 provides an introduction to
the book by setting the scene that governs our approach, as described in the
previous section, and this description of the content of each chapter. Chapter
2 deals with a discussion of the basic input for pandemic prediction: the data
of a pandemic. The main data output of pandemics and epidemics (confirmed
cases and death counts) is described and then discussed. Other data sources
that might be relevant for the understanding of the process are also presented,
discussed and compared, and their relation established with primary data
sources.
Part II introduces statistical modelling. It is divided into four chapters for
ease of presentation and understanding. Chapter 3 presents and discusses the
main epidemiological inputs of the temporal evolution of the pandemic that led
to the (generalised) logistic form. Alternative formulations are also provided.
Properties that are useful for understanding and for communicating the main
features of pandemics/epidemics are presented and evaluated for these spec-
ifications. Chapter 4 presents possible data distributions for pandemic data,
starting from the canonical Poisson specification. Overdispersion is a relevant
concept and different forms to introduce it are presented and compared. Once
the probabilistic specification and data distributions are set, the model for the
observations is completely specified and inference can be performed. Chapter
5 presents many other data features that appear in some observational units
due to their departure from the basic hypothesised model. Finally, Chapter 6
presents a review of the Bayesian inference approach used in this book. Prior
distributions will play an important stabilising role, especially when scarce
data is available, e.g., at the early stages of the pandemic. Approximations
are required in cases when analytical results are not available, as is the case
with most results of this book. The techniques used for approximations are
also briefly presented.
Part II presents the basic ingredients for modelling but does not exhaust
the many advances that were proposed in the literature. It only describes
the main pandemic features, directly observed from the data. They can be
modelled in a single layer specification, without the need for further structure.
Part III deals with situations where extra layers are required for appropri-
ate modelling. These additional layers or levels are required to accommodate
variations from the basic structure. Chapter 7 handles statistical solutions to
many of the data problems that are identified in Chapter 2. One of the most
important problems is under-reporting, a prominent issue in the data anal-
ysis of many pandemics/epidemics. Another relevant issue is that pandemic
counts are based on the false assumption of perfect identification of cases
(and deaths). Approaches to address the above issues are presented. Chapter
Overview of the book 11

8 presents a number of alternative models that address relevant extensions of


the modelling tools of Part II. They are based on the hierarchical specifica-
tion of the model with the presence of additional layers. These layers introduce
temporal, spatial and/or unstructured dependence among units or times.
Part IV deals with the implementation of these ideas in a platform to dis-
play the prediction results. Chapter 9 outlines the procedure of automating
data extraction and preparation. This ETL (Extract, Transform and Load)
stage is essential when dealing with different data sources for several coun-
tries that are constantly being updated. Chapter 10 explains the steps for au-
tomating modelling and inference, considering scalability and reproducibility.
A review of the Bayesian software options available and their characteristics
is also presented. Lastly, Chapter 11 describes the development of the online
application. A brief tutorial on how to build similar platforms with Shiny is
presented, as well as details about all of its elements including the interactive
plots to display the forecasts.
Every statistical analysis should be subject to scrutiny. Part V handles
this task in its various configurations. Chapter 12 describes data monitoring
schemes to anticipate their possible departure from an underlying trend. Chap-
ter 13 describes how to set up a constant monitoring scheme on a routine basis
for early identification of depreciation of prediction performance. Chapter 14
provides comparison results against similar platforms for pandemic prediction.
Part VI describes the R package PandemicLP, the software developed by
our team to perform the analyses and produce the predictions. Chapter 15
provides an overview of the package basic functionalities, and presents ex-
amples to help the users to apply our proposed methodology to their own
data sets. Chapter 16 addresses more advanced features already available in
the software. These include settings to control MCMC efficiency and tools for
sampling diagnostics.
The book is concluded with Part VII. There, a single chapter briefly ad-
dresses some of the relevant points that the future editions of this (or a sim-
ilar) book could cover. It discusses some of the possibilities for extensions to
the framework of the book in all its dimensions: modelling, implementation,
monitoring and software.
The book also has its own supplementary material repository. It is located
at github.com/CovidLP/book. It contains files related to the book such as
code and databases for the figures and tables of the book. It is worth men-
tioning that the book repository above is one of a number of repositories
from the CovidLP project, freely available at the encompassing repository
github.com/CovidLP.

1.2.1 How to read this book


This book is unavoidably interdisciplinary, especially in terms of Statistics
and Computing. Therefore, it may serve different purposes for readers with
12 Building a Platform for Data-Driven Pandemic Prediction

different backgrounds. We describe next some possible ways to read this book
depending on the type of information the reader is seeking.
First, the trivial path is to read the seven book parts in sequence, which
may serve users interested in learning about all the stages of our platform-
building process, focused on the COVID-19 pandemic forecasting. This path
is also instructive to whomever wants to replicate the entire process of our
project, with similar modelling frameworks.
Another possible reading path is geared to users interested mainly in the
methodological aspects of modelling epidemic data. For those users, we recom-
mend reading Part I for introduction and data description, and Parts II and
III for basic and further modelling aspects, respectively. Included at the end of
Part II is Chapter 6, which presents a review of Bayesian inference. This chap-
ter is left as an option for the reader who might feel the need to learn/revisit
the main concepts used within the book. The user following along this reading
path might also be interested somewhat in Part VI, where the implemented
functions from the R package to fit the proposed models are exemplified.
For users searching for instructions about how to create and maintain
an online platform for up-to-date presentation of some statistical results, we
suggest following Part I with the reading of Parts IV and V. These parts
include the step-by-step handbook of how to create an online application
with automatic data extraction and obtaining of predictions, in addition to
the discussion of some important features to monitor on inference results.
The book is finalised in Part VII, where most parts are revisited with
an introductory, concise view to summarise possible directions for the future.
This part might be of interest to all readers not satisfied just by what was
done, but also on what could come next.

1.2.2 Notation
Most pandemic data consist of counts, usually counts of infected cases or
deaths caused by the disease. The counts can be recorded separately for each
time unit or recorded by accumulation over the previous time units. The latter
is the result of integration (or sum) of the former. The usual mathematical
standard is to use capital letters for the integrated feature and small case
for the integrand. Thus, cumulative counts will be denoted by Y , while their
counts over a given time unit will be denoted by y.
Counts are collected over periods of time, that could be days, weeks,
months, etc. Whatever the time unit, the counts at time t are denoted by
yt , while the cumulative P counts up to time t are given by y1 + · · · + yt and
t
denoted by Yt , i.e., Yt = j=1 yj , for all t.
Typically, these counts are random variables with finite expectations or
means. In line with the distinction between cumulative and time-specific
counts defined above, the means are denoted by M (t) = E(Yt ) for the cu-
mulative
Pt means, and µ(t) = E(yt ) for the time-specific means, i.e., M (t) =
j=1 µ(j), for all t. The dependence on time is denoted for the means in the
Overview of the book 13

most usual functional form because their dependence on time will be made
explicit, unlike the counts that will depend on time implicitly. These points
will be made clear in Part II.

Bibliography
Box, G., Hunter, J. and Hunter, W. (2005) Statistics for Experimenters: De-
sign, Innovation, and Discovery. Wiley Series in Probability and Statistics.
Wiley. URLhttps://books.google.com.br/books?id=oYUpAQAAMAAJ.
Chang, W., Cheng, J., Allaire, J., Xie, Y. and McPherson, J. (2020) shiny:
Web Application Framework for R. URLhttps://CRAN.R-project.org/
package=shiny. R package version 1.5.0.
Chowell, G., Luo, R., Sun, K., Roosa, K., Tariq, A. and Viboud, C. (2020)
Real-time forecasting of epidemic trajectories using computational dynamic
ensembles. Epidemics, 30, 100379.
Dong, E., Du, H. and Gardner, L. (2020) An interactive web-based dashboard
to track COVID-19 in real time. The Lancet Infectious Diseases, 20, 533–
534.
Fineberg, H. V. and Wilson, M. E. (2009) Epidemic science in real time.
Science, 324, 987–987.
Funk, S., Camacho, A., Kucharski, A. J., Lowe, R., Eggo, R. M. and Edmunds,
W. J. (2019) Assessing the performance of real-time epidemic forecasts: A
case study of Ebola in the Western Area region of Sierra Leone, 2014-15.
PLOS Computational Biology, 15.
Funk, S. and King, A. A. (2020) Choices and trade-offs in inference with
infectious disease models. Epidemics, 30, 100383.
Grimm, V., Mengel, F. and Schmidt, M. (2021) Extensions of the SEIR model
for the analysis of tailored social distancing and tracing approaches to cope
with COVID-19. Scientific Reports, 11.
Held, L., Hens, N., O’Neill, P. and Wallinga, J. (eds.) (2019) Handbook of
Infectious Disease Data Analysis. Boca Raton: Chapman and Hall/CRC,
1st edn.
Hethcote, H. W. (2000) The mathematics of infectious diseases. SIAM Review,
42, 599–653.
Holmdahl, I. and Buckee, C. (2020) Wrong but useful – What Covid-19
epidemiologic models can and cannot tell us. New England Journal of
Medicine, 383, 303–305. URLhttps://doi.org/10.1056/NEJMp2016822.
14 Building a Platform for Data-Driven Pandemic Prediction

Hsieh, Y.-H. and Cheng, Y.-S. (2006) Real-time forecast of multiphase out-
break. Emerging Infectious Diseases, 12, 122–127.
Hsieh, Y.-H., Fisman, D. N. and Wu, J. (2010) On epidemic modeling in real
time: An application to the 2009 novel A (H1N1) influenza outbreak in
Canada. BMC Res Notes, 3.
Ioannidis, J. P., Cripps, S. and Tanner, M. A. (2020) Forecasting for COVID-
19 has failed. International Journal of Forecasting. URLhttp://www.
sciencedirect.com/science/article/pii/S0169207020301199.
Kermack, W. O. and McKendrick, A. G. (1927) A contribution to the math-
ematical theory of epidemics. Proceedings of the Royal Society A, 115,
700–721.
King, A. A., Nguyen, D. and Ionides, E. L. (2016) Statistical inference for
partially observed Markov processes via the R package pomp. Journal of
Statistical Software, Articles, 69, 1–43.
Martianova, A., Kuznetsova, V. and Azhmukhamedov, I. (2020) Mathemati-
cal model of the COVID-19 epidemic. In Proceedings of the Research Tech-
nologies of Pandemic Coronavirus Impact (RTCOV 2020), 63–67. Atlantis
Press.
Murray, L. M. (2015) Bayesian state-space modelling on high-performance
hardware using LibBi. Journal of Statistical Software, Articles, 67, 1–36.
Parshani, R., Carmi, S. and Havlin, S. (2010) Epidemic threshold for the
susceptible-infectious-susceptible model on random networks. Phys. Rev.
Lett., 104, 258701.
Pell, B., Kuang, Y., Viboud, C. and Chowell, G. (2018) Using phenomenolog-
ical models for forecasting the 2015 Ebola challenge. Epidemics, 22, 62–70.
The RAPIDD Ebola Forecasting Challenge.
R Core Team (2020) R: A Language and Environment for Statistical Comput-
ing. R Foundation for Statistical Computing, Vienna, Austria. URLhttps:
//www.R-project.org/.
Reich, N. G., Brooks, L. C., Fox, S. J., Kandula, S., McGowan, C. J., Moore,
E., Osthus, D., Ray, E. L., Tushar, A., Yamana, T. K., Biggerstaff, M.,
Johansson, M. A., Rosenfeld, R. and Shaman, J. (2019) A collaborative
multiyear, multimodel assessment of seasonal influenza forecasting in the
United States. Proceedings of the National Academy of Sciences, 116, 3146–
3154.
Roosa, K., Lee, Y., Luo, R., Kirpich, A., Rothenberg, R., Hyman, J., Yan,
P. and Chowell, G. (2020) Real-time forecasts of the COVID-19 epidemic
in China from February 5th to February 24th, 2020. Infectious Disease
Modelling, 5, 256–263.
Overview of the book 15

Scott, J. A., Gandy, A., Mishra, S., Unwin, J., Flaxman, S. and Bhatt, S.
(2020) epidemia: Modeling of epidemics using hierarchical Bayesian models.
URLhttps://imperialcollegelondon.github.io/epidemia/. R pack-
age version 0.7.0.
Siettos, C. I. and Russo, L. (2013) Mathematical modeling of infectious disease
dynamics. Virulence, 4, 295–306.

Sievert, C. (2020) Interactive Web-Based Data Visualization with R, plotly,


and shiny. Chapman and Hall/CRC. URLhttps://plotly-r.com.
Tariq, A., Lee, Y., Roosa, K., Blumberg, S., Yan, P., Ma, S. and Chowell,
G. (2020) Real-time monitoring the transmission potential of COVID-19 in
Singapore, March 2020. BMC Medicine, 18.
Viboud, C. and Vespignani, A. (2019) The future of influenza forecasts. Pro-
ceedings of the National Academy of Sciences, 116, 2802–2804.
2
Pandemic data

Dani Gamerman
Universidade Federal de Minas Gerais/Universidade Federal do Rio de
Janeiro, Brazil

Vinı́cius D. Mayrink
Universidade Federal de Minas Gerais, Brazil

Leonardo S. Bastos
Fundação Oswaldo Cruz, Brazil

CONTENTS
2.1 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Occurrence and notification times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Other relevant pandemic data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4 Data reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Data is the primary input for any statistical analysis. In this chapter we
present the main aspects of pandemic data, starting from their definitions.
Their virtues and deficiencies are described and compared. We focus on data
used for and in the predictions. Auxiliary variables that are related to the
primary data variables are also described and their relations are presented.

2.1 Basic definitions


It is first important to establish exactly what is being predicted in order to
make pandemic prediction. Clear definitions are required from the beginning.
An obvious pre-requisite is to know what a pandemic is. According to Last
(2001) a pandemic is “an epidemic occurring worldwide, or over a very wide
area, crossing international boundaries, and usually affecting a large number
of people”. This definition calls for the definition of epidemic. The same dic-
tionary informs us that an epidemic is “the occurrence in a community or

DOI: 10.1201/9781003148883-2 17
18 Building a Platform for Data-Driven Pandemic Prediction

region of cases of an illness, specific health-related behaviour, or other health-


related events clearly in excess of normal expectancy”. These definitions are
quite general including epidemics of communicable and non-communicable
diseases, risk factors like obesity, or even the spread of information like fake
news. It is worth mentioning that our primary concern is to deal with epi/pan-
demic of infectious or communicable diseases, like the coronavirus infectious
disease (COVID-19) caused by the coronavirus SARS-CoV-2.
A few comments are in order here. The first comment is that the definitions
above are relative in the sense that their very definition is comparative against
a “normal” situation. The second comment is that the definitions above do not
account for severity or other disease characteristics. This is possibly to avoid
more subjectivism and hence to avoid criticisms over the definitions. Never-
theless, this cautionary approach did not prevent authors from questioning
them (Doshi, 2011). We will avoid this discussion hereafter and assume that
the diseases we will be predicting meet the epi/pandemic criteria.
The third comment has a deeper impact on this book and deserves a
full paragraph. The definitions above make it clear that the major difference
between a pandemic and an epidemic is its reach. When an epidemic affects
a large number of countries or the entire world, as for example happened
with the COVID-19 in 2020, then it becomes a pandemic. Therefore, if one
is considering predicting only a single geographical region (be it a continent,
a country, a state or a city) without taking into account other regions, the
same statistical models and methods can be indistinguishably used for either
a pandemic or an epidemic.
There are a number of basic statistics associated with any disease. By far,
the most used ones are the counts of cases and deaths of the disease. They
appear as the major statistics from bulletins that are routinely issued by health
authorities worldwide, especially during the course of an epi/pandemic. Once
again, the concepts of cases and deaths of a disease must be clearly defined.
These are usually defined based on clinical and/or laboratory criteria. Note
that these criteria may change across regions or, even worse, over time during
the course of a pandemic. Changes across regions make it difficult to compare
them as the counts may be reporting different characteristics. Changes across
time make it difficult to associate a single temporal pattern and to associate
a relation between successive times, complicating the statistical analyses of
their evolution.
It is crucial to have counts of a disease in a timely fashion. They help health
officials make decisions on allocation of resources (personnel and equipment)
and let the general public be aware of the current situation and its evolution.
They also allow the immediate update of prediction systems, providing further
information to the society. The relevance of fast disclosure of the data available
obviously depends on the severity of a disease. Most recent epidemics and
pandemics are undoubtedly severe enough to give an urgency to the need
for the release of information. This urgency has an important impact over
the data protocols for data disclosure. Throughout the world, they typically
Pandemic data 19

involve the daily release of data collected up to that day. These are usually
referred to as confirmed cases of a disease.
Other time windows are also obtained for some pandemics. By far the sec-
ond most common time frame for data release is weekly data, especially for
diseases with very low counts. This also allows for removal of possible weekday
effects that exist in some health notification systems. Another strategy com-
monly applied in diseases with weekday variations is the use of 7-day moving
averages to smooth data counts.
The distinction between different categories of cases will be returned to
in the next section. Figure 2.1 illustrates the data mentioned above for the
COVID-19 pandemic for Switzerland.
Switzerland Switzerland

60
1000 New deaths per day
New cases per day

40

500

20

0 0
18/May/20

18/May/20
16/Mar/20

16/Mar/20
03/Feb/20

24/Feb/20

03/Feb/20

24/Feb/20
08/Jun/20
06/Apr/20

27/Apr/20

06/Apr/20

27/Apr/20
date date

FIGURE 2.1: Daily data of confirmed cases and confirmed deaths of COVID-
19 for Switzerland.

Figure 2.1 also includes the number of confirmed counts associated with
deaths caused by the disease. One striking feature of the figure is the qualita-
tive similarity of the curves of confirmed cases and confirmed deaths observed
for the same region. This similarity occurs despite the substantial difference
in nature of the two types of counts. This similarity is observed in many other
countries across the globe and will be explored by the data-driven approach
throughout the book.
In the context of a infectious disease epidemic, some individuals may be
infected without manifesting any symptom. These individuals, however, can
still transmit the agent that causes the disease. For example, a person infected
with the human immunodeficiency virus (HIV), called HIV-positive, may be
asymptomatic for several years. Without adequate treatment, a person with
HIV can develop the disease AIDS (acquired immune deficiency syndrome).
In any case, diseases manifest themselves through their symptoms. And
we just noted above that some infected individuals may be asymptomatic
in the sense they may not manifest any symptom and may carry on their
normal life totally unnoticed. Nevertheless, even though harmless to these
individuals, they may infect other people. Some of the newly infected ones
20 Building a Platform for Data-Driven Pandemic Prediction

may well present serious symptoms and the disease may even evolve to their
deaths. Thus, asymptomatic cases are just as important as the symptomatic
cases but they are typically harder to identify. If the symptoms do not show
up in an individual, it becomes very hard to identify him/her as a case. The
only way it could happen is through a widespread testing campaign in the
region of the individual. These campaigns are costly and present many logistics
difficulties.
The COVID-19 disease is a timely and important illustrative example.
Identification of cases is typically achieved through molecular tests such as
RT-PCR (reverse transcription polymerase chain reaction). These tests are
not widely available in many countries and even when they are, they may
take days to have their results released. These difficulties cause large parts of
the infected population to go unnoticed. There is still a high variability for the
proportion of asymptomatic cases among studies. Kronbichler et al. (2020) was
one of the earlier studies and indicated this proportion to be around 62%. So, a
substantial proportion of under-reporting is to be expected unless population-
wide testing is performed. Tests are also imperfect, even molecular tests are
due to error leading to false-positive and false-negative results. Imperfect tests
will be revisited in Section 7.3.2.
Data for diseases must be handled with care for the above reasons. When
there is a substantial proportion of asymptomatic cases, counts will be affected
by this feature. Therefore, it is inappropriate to refer to the obtained counts
as total counts. This issue is clarified with the nomenclature used for data
released on pandemics. Instead of referring to these measurements simply as
counts, they are named confirmed counts. This standard is widely used and
will be considered throughout this book.

2.2 Occurrence and notification times


As mentioned in the previous section, pandemic data released daily by health
authorities present counts that are compiled according to individuals that
become known to be cases (or deaths) at a given day. These counts will be
referred to as counts by notification date and these are the confirmed counts,
referred to in the previous section. This association of cases with the date
they are notified is the simplest form to compile pandemic data, but it is far
from perfect. Cases are notified at the date they become publicly available
by the health authorities. It is well known that there is an inevitable delay
between the date a case occurs and the date the health system becomes aware
of this event, after being notified. This delay can range from hours to weeks,
depending on the efficiency of the health data collection system. Clearly, the
mentioned delay has an important impact on the counts.
Pandemic data 21

Counting by notification date is clearly simpler, but it has important


methodological drawbacks. Assume a case is notified a week after it actu-
ally occurred. If this case is only added to the counts at its notification date,
it will provide delayed information about the evolution of the disease, biasing
the inference. The magnitude of the bias may be noticeable if this delay im-
pacts a substantial proportion of the cases. A qualitatively similar argument is
valid for counting deaths. Example 2.1 below provides a numerical illustration
of the problem.

Example 2.1: Let 100 cases be notified on day X. Assume that the occur-
rence dates are distributed as follows:

Occurrence day X X-1 X-2 X-3 X-4 X-5 X-6 X-7 X-8 X-9
55 15 12 8 4 1 0 2 2 1
Thus, only 55 of these cases actually occurred at day X. All the other 45
individuals became cases at an earlier day.

Therefore, it seems more appropriate to monitor any pandemic through


cases counted by occurrence date. In fact, there is little doubt about the
adequacy of this choice. Why is this protocol not used worldwide? One possible
explanation is the information delay occurring even for regions with entirely
organised health databases. If counts arrive to the health information system
with a delay, then total counts that occurred at a given date would not be
available on the same day. Therefore, if a real-time analysis is needed, the most
up-to-date counts are poorly represented. So in order to use the occurrence
date, it is necessary to correct the delay. This issue will be explored in more
detail in Section 7.2.
Another problem related to occurrence dates is the difficulty in obtaining
such data. Ascertainment of the exact occurrence date requires going back
to each individual file. This task is far from trivial as typically the data are
obtained at a disaggregated level (district or city) and informed at an aggre-
gated level. This would require an integrated system as these operations must
be registered digitally for fast dissemination of information.

Example 2.1 (continued): Assume that further cases, which occurred at


day X, were notified at the following days:

Notification day X X+1 X+2 X+3 X+4 X+5 X+6 X+7


55 20 8 4 2 0 0 1
So, the further 35 cases actually occurred at day X, but were reported at a
later date. Therefore, the number of cases that occurred at day X was actually
90, while the number of cases that were notified at day X is 100. The number
of cases known at day X, that occur at that day, is just 55. So, using the
counts by occurrence date, it is necessary to estimate the cases that occurred,
but have not been reported yet using a statistical method.
Another random document with
no related content on Scribd:
Combalet, Madame de (see d’Aiguillon), 12, 127, 202-3, 209-10,
215, 235, 242, 250, 252, 268
Concini, Concino (see Ancre, Maréchal d’), 36, 63, 72, 80, 86,
211, 256
Concini, Henry (see Comte de la Pena)
Condé, Charlotte de Montmorency, Princesse de, 32, 105, 153,
178, 230, 242
Condé, Françoise d’Orléans-Longueville, Princesse de, 33
Condé, Henry de Bourbon, Prince de, 32, 56, 65, 66, 68, 69, 72-
8, 84-6, 90, 105, 120, 122-3, 126, 129-30, 137, 164-5, 169,
183-4, 196, 198, 201, 222, 225, 259, 280
Condé, Louis I. de Bourbon, Prince de, 33
Condé, Louis II. de Bourbon, Duc d’Enghien, Prince de, 12, 105-
6, 246, 280, 284
Conrart, Valentin, 244
Conti, François de Bourbon, Prince de, 32, 53, 90
Conti, Louise de Lorraine, Princesse de, 32, 194, 209, 220-21
Cordova, Don Gonzalez de, 197
Corneille, Pierre, 239, 245
Cossé-Brissac, Duc et Maréchal de, 126
Cotton, Père, 45, 55, 103
Courcelles, 185
Cousin, M. Victor, 129, 266
Cramoisy, 71
Créquy, Charles, Maréchal de, 196
Desmarets, Jean, 246, 292
Du Moulin, 151

Edmunds, Sir Thomas, 72


Effiat, Antoine Coiffier, Maréchal d’, 149, 225
Elbeuf, Charles de Lorraine, Duc d’, 105, 218, 228, 230, 253
Elbeuf, Duchesse d’ (see Vendôme)
Elector Palatine (Frederic, King of Bohemia), 73, 136, 155
Elizabeth (Electress Palatine), 144
Elizabeth, Queen (of England), 1
Elizabeth, Queen (of Spain), 64, 68, 195
Enghien, Duc d’ (see Condé, Louis II., Prince de)
Épernon, Jean Louis de Nogaret de la Valette, Duc d’, 35, 53,
56, 68, 73, 85, 91, 111-112, 115, 116, 118-19, 124, 172-3,
198, 214, 230, 252, 259, 261
Escoman, Dame d’, 54
Este, Isabella d’ (Duchess of Mantua), 206
Estoile, Claude de l’, 245
Estoile, Pierre de l’, 34, 36
Estrées, François Annibal, Maréchal d’ (see Cœuvres), 219-20

Falconis, Pierre, 285


Fancan, Langlois, Sieur de, 137-9, 142
Fargis, Charles d’Angennes, Comte du, 159, 219
Fargis, Magdeleine de Silly, Comtesse du, 159, 219, 264
Fayette, Chevalier de la, 266
Fayette, Mademoiselle Louise de la, 265-7, 270
Fenouillet (Bishop of Montpellier), 34
Ferdinand II., Emperor, 146, 201, 251, 276, 284
Ferdinand of Spain, Cardinal-Infant, 259, 267
Feria, Duke of, 250
Fiesque, Comte de, 98
Flotte, Madame de la, 264
Fontaine, Jean de la, 236
Fontrailles, Vicomte de, 283-4, 286
Force, Jacques Nompar de Caumont, Maréchal de la, 150, 228,
258
Franchine, le Sieur, 134
François I., King, 3, 4
Fronsac, Duc de (see Maillé-Brézé)

Gassion, Colonel de, 281


Gaston de France (Monsieur), 68, 118, 133, 162-70, 172-4, 179-
80, 183, 189, 201-2, 207, 214, 217-18, 220, 223-31, 234,
236-7, 248, 250-53, 258-62, 270-71, 274-5, 278, 283-4,
287, 292, 294, 297
Gerson, Jean, 18
Girardon, François, 295
Givry, Cardinal de, 25
Gontaut-Biron (see Biron)
Gonzague, Princesse Marie de, 224, 282
Gournay, Mademoiselle de, 243
Grammont, Comte de, 259
Grandier, Urbain, 44
Gratiollet, Jean, 254
Gregory XV., Pope (Ludovisi), 130, 147
Guastalla, Duke of, 194
Guébriant, Comte de, 276
Guercheville, Marquise de, 55, 101, 115
Guiche, Antoine de Grammont, Comte et Maréchal de, 252, 286
Guise, Duchesse de (Catherine de Clèves), 32
Guise, Charles, Duc de, 34, 53, 56, 73, 75, 85, 90, 105, 154,
185, 220
Guise, Henry, Duc de (Le Balafré), 1, 8, 32, 33, 132, 154
Guise, Henry, Duc de (Archbishop of Rheims), 277-8
Guise, Duchesse de, Henriette Catherine de Joyeuse (see
Montpensier)
Guiton, 191-2
Guron, Jean, Seigneur de, 181-2, 193
Gustavus Adolphus (King of Sweden), 209, 224, 232

Hanotaux, M. Gabriel, 21, 132


Harcourt, Henry de Lorraine, Comte d’, 253, 277
Hautefort, Mademoiselle (afterwards Madame) de, 264-5, 267,
269-71
Hay, James, Lord (see Carlisle), 85
Henriette Marie de France (Queen of England), 68, 118, 144-5,
153-4, 156
Henry II., King (of England), 121
Henry II., King, 5
Henry III., King, 6, 8, 16, 132
Henry IV., King, 8, 10, 12, 13, 16, 23, 24, 26-37, 47, 48, 52-55,
64, 65, 69, 70, 79, 83, 103, 105, 113, 118, 124, 128, 138,
144, 151, 153, 161, 163, 170-71, 177, 221, 225, 229, 252,
295
Henry, Prince of Wales, 144
Hérouard, Jean, 82
Holland, Henry Rich, Earl of, 145

Isabel, Archduchess, 177, 181, 221

James I., King (of England), 28, 136, 144


Jamyn, Mademoiselle, 243
Jansenius, 23, 275
Jars, Chevalier de, 269
Jeanne, Queen, 17
Jeannin, President, 81, 98
John, King (of England), 120
Joseph, Père (François du Tremblay), 46, 57-62, 85, 105, 113-
14, 116, 137, 142, 151, 159, 163, 187, 189, 206, 231, 239,
242, 257, 265, 272-3, 296
Joyeuse, Duc de (Père Ange), 34, 59, 272

La Brosse, 42
La Bruyère, Jean de, 245
Laffemas, Isaac, 222, 239, 269
La Porte, Amador de, 10, 11, 14, 16, 20, 117, 120, 183, 215,
232, 279
La Porte, Charles de (see Meilleraye)
La Porte, François de, 7, 11
La Porte, Suzanne de (Madame de Richelieu), 7, 11-14, 23, 65,
75, 86
La Porte (the Queen’s valet), 268-9
Laubardemont, Baron de, 288-9
Launay-Razilly, Claude, Seigneur de, 181
Le Fèvre, 293-4
Le Jay, Nicolas, 73, 76
Le Mercier, 15, 234
Lemoine, Cardinal, 17
Leopold, Archduke, 146
Le Roy, Guyon, 3
Le Roy, Jacques, 3
Lesdiguières, François de Bonne, Duc et Maréchal de, 91, 147,
150, 158, 161
Limoges, François de la Fayette, Bishop of, 266
Lisieux, Bishop of, 293
Longueville, Anne Geneviève de Bourbon, Duchesse de, 105,
164, 246, 280
Longueville, Henry d’Orléans, Duc de, 66, 76, 86, 91, 122, 124,
137
Longueville, Mademoiselle de, 246
Lorraine, Charles, Duke of, 176, 181, 202, 218, 223-4, 249-51,
255, 278
Lorraine, Princesse Marguerite de (Duchesse d’Orléans), 224,
250
Lorraine, Nicolas François, Cardinal de, 250-51
Louis XI., King, 6, 169, 289
Louis XII., King, 4
Louis XIII., King, 32, 42, 53, 64, 68, 82-5, 94-105, 114-15, 119-
31, 133, 136-44, 146-50, 152-8, 160-80, 183-92, 194-202,
205-16, 218-22, 224-32, 236, 239, 241, 246-51, 254, 257-
60, 262-71, 273, 276-8, 282-9, 291-3, 297
Louis XIV., King, 28, 69, 142, 177, 192, 271
Louis, Saint, 34
Louvigny, Comte de, 173
Lude, François de Daillon, Comte du, 83, 120
Lusignan, Guy de, 2
Luynes, Charles d’Albert, Duc de, 83, 84, 94-8, 100-102, 105,
107-8, 112, 114-15, 117-24, 126-7, 129-30
Luynes, Marie de Rohan, Duchesse de (see Chevreuse)

Maillé-Brézé, Armand Jean de (Duc de Fronsac), 117, 280


Maillé-Brézé, Claire Clémence de, 12, 246, 280
Maillé-Brézé, Maréchal et Marquis de, 12, 106-7, 117, 181, 184,
228, 239, 256, 277, 279, 292
Maillé-Brézé, Marquise de (see Nicole de Richelieu)
Maline, Madame, 270
Mangot, Claude, 81, 86-9, 92, 97
Mansfeldt, Count, 155, 199
Mantua, Charles de Gonzague, Duke of (see Nevers)
Mantua, Vincenzo di Gonzaga, Duke of, 193
Marcillac, Prince de (Duc de la Rochefoucauld), 269
Marconnay, Madame de (Françoise du Plessis), 11, 13, 41
Marillac, Madame de (Catherine de Médicis), 223
Marillac, Louis, Maréchal de, 124, 183, 189, 201-2, 208, 211,
213, 216, 222-3, 236, 266
Marillac, Michel de, 159-61, 170, 194, 201, 211, 213, 216, 222,
266
Marolles, Claude de, 92
Marolles, Abbé Michel de, 92, 247
Marquemont, Cardinal de, 149
Martin, M. Henri, 72, 77, 91, 185, 195, 207, 232
Mary, Queen (of Scotland), 1
Maurice, Prince, 31
Mausson, Sieur de, 5, 6
Mayenne, Charles de Lorraine, Duc de, 65, 66, 91, 122, 124,
129
Mazarin, Jules (Cardinal), 11, 206, 239, 247, 273, 285, 292
Médicis, Queen Catherine de, 55
Médicis, Queen Marie de, 28-30, 32, 33, 35, 54-7, 61, 68-70,
72-9, 82-6, 88, 89, 91, 92, 94-103, 107, 111-15, 116-28,
130-37, 139-41, 144-7, 153, 159, 162-4, 167-70, 180, 186-
8, 194-6, 201-3, 207-21, 223-4, 227, 229-30, 250-51, 256,
261-2, 264, 266, 282
Meilleraye, Charles de la Porte, Maréchal et Duc de la, 11, 203,
210, 215, 232, 239-40, 276, 279, 284, 292
Mercœur, Duc de, 36
Métezeau, 190
Michelet, Jules, 43, 132, 286
Miron, Robert, 67, 70, 71
Monot, Père, 266, 276
Montaigne, Michel de, 243
Montalto, Philothée, 74
Montbazon, Hercule de Rohan, Duc de, 53, 92, 105, 118
Montbrun, St. André de, 199
Montespan, Madame de, 4
Montglat, Baronne de, 30
Montglat, François de Clermont, Marquis de, 194, 257-8, 260-
61, 263, 280, 288-9, 296
Montigny, François de la Grange, Maréchal de, 90, 91
Montluc, Maréchal de, 3
Montmorency, Henry I., Connétable et Duc de, 35, 64, 153
Montmorency, Henry II. Duc de, 35, 153, 161, 178, 183, 202,
205, 208, 222-3, 225-32
Montmorency, Duchesse de (Laurence de Montoison), 136
Montmorency, Duchesse de (Maria Felice Orsini), 226-7
Montmorency, Mademoiselle de (see Condé, Princesse de)
Montpensier, Duc de, 3, 33
Montpensier, Duchesse de (Catherine Marie de Lorraine), 8, 33
Montpensier, Henry de Bourbon, Duc de, 14, 33, 34, 42
Montpensier, Duchesse de (Henriette Catherine de Joyeuse),
33, 59, 163, 172
Montpensier, Duchesse de (la Grande Mademoiselle) 15, 33,
133, 179, 237, 246, 252, 261-2, 264, 269, 280
Montpensier, Mademoiselle de (Duchesse d’Orléans), 33, 162-
4, 172-4, 179
Montrésor, Comte de, 259-60
Moret, Antoine de Bourbon, Comte de, 218, 228-9
Mornay, Philippe du Plessis-, 47, 150
Motteville, Madame de, 163, 220, 264, 267, 269, 288
Mulot (chaplain), 22
Mulot (secretary), 239
Mulot (le petit), 296

Navailles, Seigneur de, 183


Navarre, Princesse Catherine de, 33
Nemours, Henry, Duc de, 124
Neufbourg, M. de, 65
Nevers, Charles de Gonzague, Duc de (see Mantua), 66, 90,
113, 151-2, 193-4, 201-3, 206, 209, 224, 256
Noyers, le Sieur Sublet de, 239-40, 268, 285, 291

Olivarez, Count, 159, 181, 277, 284


Orange, Prince of, 255
Oresme, Nicolas, 18
Orléans, Duc d’ (see Gaston)
Orléans, Duchesse d’ (see Lorraine)
Orléans, Nicolas, Duc d’, 163
Orléans, Duc d’ (see Philippe de France)
Orléans-Longueville, Madame Antoinette d’, 58-61
Ornano, Maréchal d’, 96, 120, 163-6, 169-70, 174, 176
Ornano, Maréchale d’, 166, 220

Paul V., Pope (Borghese), 23, 24, 58-60, 127


Paul, Vincent de, 22, 274
Pena, Henry Concini, Comte de la, 98
Péréfixe, Hardouin de, 31, 54, 55, 285
Perron, Cardinal du, 53, 69, 120
Philip II., King (of Spain), 1
Philip III., King (of Spain), 82
Philip IV., King (of Spain), 64, 146-7, 197, 267, 284
Philippe Auguste, King, 17, 85
Philippe le Bel, King, 17
Philippe de France (Duc d’Orléans), 271
Piedmont, Victor Amédée, Prince of (see Savoy)
Piney-Luxembourg, Léon d’Albert, Duc de, 83
Plessis, François du, 3
Plessis, Françoise du (see Marconnay)
Plessis, Geoffroy du, 2
Plessis, Jacques du, 14, 16, 39
Plessis, Louise du (Madame de Pontchâteau), 252
Plessis, Pierre du, 2
Plessis-Richelieu (see Richelieu)
Plessis, Sauvage du, 2
Pluvinel, M. de, 20-22
Polignac, Dame Anne de, 5
Pontchartrain, Paul Phélypeaux, Seigneur de, 88, 89, 91, 93
Pontchâteau, Mademoiselle Philippe de (Duchesse de
Puylaurens), 252-3, 280
Pontchâteau, Mademoiselle Marie de (Duchesse de la Valette),
252, 280
Pont-de-Courlay, François de Vignerot, Marquis du, 279
Pont-de-Courlay, Marie Magdeleine Vignerot du (see Aiguillon)
Pont-de-Courlay, René de Vignerot, Seigneur de, 11, 55, 108-9
Poussin, Nicolas, 136
Puisieux, Pierre Brûlart de, 130, 136, 139
Puylaurens, Antoine de Lâge, Duc de, 217, 228, 251-3

Rabelais, 22
Rambouillet, Catherine de Vivonne, Marquise de, 242
Rambouillet, Charles d’Angennes, Marquis de, 214, 242
Rancé, Armand Jean de, 11
Rancé, Denys Bouthillier, Baron de, 11, 203, 210
Rapine, Florimond, 68
Ravaillac, François, 32, 52, 54
Renaudot, Théophraste, 239
Reni, Guido, 136
Retz, Abbé de (afterwards Cardinal), 264
Retz, Duc de, 125
Richelieu, Alphonse de (Archbishop of Lyons and Cardinal), 12,
16, 21, 22, 63, 86, 194, 279
Richelieu, Antoine du Plessis de (le Moine), 3, 4, 33, 43
Richelieu, Armand Jean du Plessis de (see Cardinal-Duc de
Richelieu)
Richelieu, Cardinal-Duc de: his birth, family and childhood, 1-15;
education at the University, 16-20;
training as a soldier, 21-2;
second University course, 23;
consecration as Bishop of Luçon, 24-5;
Doctor of the Sorbonne, 26;
at the Court of Henry IV., 27-30;
life and work in the diocese of Luçon, 38-46;
friendship with Père Joseph, 46-7;
Instructions et Maximes, 48-52;
visit to Paris, 55-6;
affair of Fontevrault, 57-62;
political troubles, 64-6;
speech at States-General, 69-70;
Chaplain to Queen Anne, 72-5;
Private Secretary to Marie de Médicis, 84;
death of his mother, 86;
appointed Foreign Secretary, 87;
First Ministry, 88-92;
fall from power, 97-8;
exile with the Queen-mother, 100-2;
retirement in his diocese, 103-7;
banishment to Avignon, 108-10;
recalled to the Queen-mother’s service, 114-15;
death of his brother Henry, 117;
influence with Marie de Médicis, 123;
diplomatic success, 126;
marriage of his niece, 127;
stories and intrigues, 130;
receives the Cardinal’s Hat, 131;
personal descriptions, 132-3;
purchase and decoration of country-houses, 133-6;
employment of Fancan, 137-8;
admitted to the Royal Council, 140;
First Minister of France, 142;
political aims, 143;
the English marriage, 144-6;
affair of the Valtelline, 146-8;
Huguenot Rebellion, 148-53;
negotiations with Buckingham, 155;
peace with Spain, 159;
Army and Navy, etc., 160-61;
ill health and suffering, 162;
defeat of Chalais conspiracy, 163-75;
edict against feudal strongholds, 176;
edict against duels, 177-9;
war with England, 180;
Siege of La Rochelle, 181-92;
War of Mantuan Succession, 193-7;
final defeat of Huguenots, 198-200;
offers his resignation to Louis XIII., 201;
Italian campaign, 202-6;
The King’s illness, 207-8;
the Cardinal in imminent danger, 209-14;
his triumph, 215-16;
victory over his enemies, 217-20;
new honours, 221-2;
political vengeance, 222-3;
triumph over the Duc de Montmorency, 225-31;
illness and recovery, 232-3;
palaces and châteaux, 234-8;
his household and friends, 239-42;
the Academy founded by him, 244-5;
the performance of Mirame, 246-8;
dreams of conquest realised, 249-51;
family alliances, 252;
France joins in the Thirty Years’ War, 254;
defeat and panic, 255-6;
high courage of the Cardinal, 257;
danger of assassination, 259-60;
Court intrigues, 263-7;
Richelieu’s persecution of Queen Anne, 267-70;
death of Père Joseph, 271-2;
reforms in the Church, 274-5;
disappearance of enemies, 277;
family honours, 279-80;
internal worries, 281;
ill health, 282;
enmity with Cinq-Mars, 283-4;
terrible sufferings and last will, 285;
final triumphs, 286-9;
journey back to Paris, 290;
last illness, 292-3;
death at the Palais-Cardinal, 294;
funeral at the Sorbonne, 295;
general feeling in France, 296-7;
the tomb in the Church of the Sorbonne, 298
Richelieu, François du Plessis de (le Sage), 3, 4, 33
Richelieu, François du Plessis de (Grand Provost), 1, 6-9, 10,
12, 20, 132
Richelieu, François Louis de, 109
Richelieu, Françoise de (Madame du Pont-de-Courlay), 11, 63,
279
Richelieu, Henry, Marquis de, 12, 14, 16, 31, 43, 55, 56, 86, 91,
102, 107-9, 116-17
Richelieu, Louis du Plessis de (grandfather), 4, 7, 11, 12
Richelieu, Louis du Plessis de (uncle), 6
Richelieu, Marquise de (Marguerite Guiot des Charmeaux), 109
Richelieu, Nicole de (Madame de Maillé-Brézé), 12, 86, 106-7,
215
Rivière, Abbé de la, 288
Roannez, Duchesse de, 220
Rochechouart, Antoine de, 4
Rochechouart, Françoise de (Dame de Richelieu), 4, 5, 6, 7, 11,
12, 46
Rochefoucauld, François, Cardinal de la, 116
Rocheposay, M. de la (Bishop of Poitiers), 45, 65, 66, 104
Roches, Michel Le Masle, Prieur des, 134-6, 239
Rohan, Henry, Duc de, 65, 74, 75, 122, 124, 148-52, 181, 189,
193, 198, 200, 255
Rohan, Duchesse de (Marguerite de Béthune), 151
Rohan, Vicomtesse de (Catherine de Parthenay-Soubise), 150-
51, 172, 182, 191
Rossignol, Antoine, 239
Rotrou, Jean de, 245
Rubens, Pierre-Paul, 135
Rucellai, the Abbé, 111, 116-17

Saint-Aignan, Comte de, 125


Saint-Cyran, Abbé de (Duvergier de Hauranne), 45, 65, 66, 104,
275
Sainte-Croix, Madame de, 45
Saint-Georges, Jeanne de Harlay, Marquise de, 261-2
Saint-Georges, le Sieur de, 240
Saint-Ibal, M. de, 259-60
Saint-Preuil, Comte de, 229
Saint-Simon, Claude, Duc de, 211, 214
Saint-Simon, Louis, Duc de, 129
Sales, St. François de, 22, 274
Savoy, Charles Emmanuel I., Duke of, 87, 91, 112, 147-8, 159,
165, 181, 194, 197, 203-6
Savoy, Prince Thomas of, 118
Savoy, Victor Amédée I., Duke of (Prince of Piedmont), 112,
118, 196, 253, 256
Saxe-Weimar, Duke Bernard of, 255, 259, 276
Schomberg, Henry, Maréchal de, 183, 189, 195, 202, 216, 228,
232
Scudéry, Georges de, 295
Séguier, Pièrre (Chancellor), 269-70, 288
Sénecé, Baron de, 70
Sénecé, Marquise de, 268
Sillery, Nicolas Brûlart de, 64, 68, 76, 81, 130, 136, 139
Smith, Richard, 23
Soissons, Anne de Montafié, Comtesse de, 90, 122, 137, 278
Soissons, Charles de Bourbon, Comte de, 32, 33, 44, 56, 90,
144
Soissons, Louis de Bourbon, Comte de, 90, 124, 165, 172, 174,
176, 181, 189, 210, 222, 256-62, 270, 277-9
Soubise, Benjamin de Rohan, Duc de, 76, 149-53, 157, 181-2,
189, 198
Soufflot, 17
Sourdis, Cardinal François de (Archbishop of Bordeaux), 63, 67,
182
Sourdis, Henry de (Bishop of Maillezais and Archbishop of
Bordeaux), 182, 189, 192, 236-7, 239, 256
Spinola, Marquis, 202
Suffren, Père, 213, 219
Sully, Maximilien de Béthune, Duc de, 27, 29, 36, 43, 55, 57, 64,
65, 76, 86, 88, 89, 161

Tallemant des Réaux, Gédéon, 7, 127, 243


Thémines, Antoine, Marquis de, 32, 117
Thémines, Marquis et Maréchal de, 173
Thianges, Madame de, 4
Thou, Jacques Auguste de, 106
Thou, François Auguste de, 106, 283, 288-9
Tilly, Comte de, 199
Tiriot, 190
Toiras, Jean, Maréchal de, 181, 184-5, 202, 204
Touchet, Marie (Comtesse d’Entraigues), 35
Tremblay, Charles Le Clerc, Seigneur du, 85, 96, 114-15
Tremblay, François Le Clerc, Marquis du (see Père Joseph)
Tremoïlle, Henry, Duc de la, 76, 189
Troisville (ou Tréville), M. de, 291
Turenne, Henry de la Tour d’Auvergne, Vicomte et Maréchal de,
284
Urban VIII., Pope (Barberini), 134, 147, 221, 252, 272-5

Valençay, Achille de (Commander, afterwards Cardinal), 167


Valette, Bernard de Nogaret, Marquis, then Duc de la, 35, 173,
252
Valette, Louis de Nogaret, Cardinal de la (Archbishop of
Toulouse), 115, 119, 202, 214, 222, 239, 242, 259, 273
Valois, Queen Marguerite de, 30, 67, 79, 153
Vardes, Comte de, 221
Vardes, Comtesse de (see Bueil)
Varenne, Fouquet de la (Bishop of Angers), 120
Varenne, Guillaume Fouquet, Marquis de la, 83
Varicarville, 260
Vautier, 220
Vendôme, Alexandre de (Grand Prieur de France), 124, 165,
167, 170-71, 176, 201
Vendôme, Catherine Henriette de (Duchesse d’Elbeuf), 105,
220
Vendôme, César, Duc de, 33, 35, 66, 91, 122-5, 152, 165, 170-
73, 176, 218, 297
Vendôme, Duchesse de (Françoise de Lorraine), 36
Vendôme, Mademoiselle de, 246
Verneuil, Henriette d’Entraigues, Marquise de, 35, 78
Verneuil, Mademoiselle de, 35
Vieuville, Charles, Marquis de la, 138-42, 145

You might also like