Paper5 (Charts)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Applied Soft Computing 106 (2021) 107330

Contents lists available at ScienceDirect

Applied Soft Computing


journal homepage: www.elsevier.com/locate/asoc

Federated learning for COVID-19 screening from Chest X-ray images



Ines Feki a , Sourour Ammar a,b , Yousri Kessentini a,b , , Khan Muhammad c
a
Digital Research Center of Sfax, B.P. 275, Sakiet Ezzit, 3021 Sfax, Tunisia
b
SM@RTS : Laboratory of Signals, systeMs, aRtificial Intelligence and neTworkS, Sfax, Tunisia
c
Visual Analytics for Knowledge Laboratory (VIS2KNOW Lab), School of Convergence, College of Computing and Informatics, Sungkyunkwan
University, Seoul 03063, Republic of Korea

article info a b s t r a c t

Article history: Today, the whole world is facing a great medical disaster that affects the health and lives of the
Received 17 September 2020 people: the COVID-19 disease, colloquially known as the Corona virus. Deep learning is an effective
Received in revised form 17 February 2021 means to assist radiologists to analyze the vast amount of chest X-ray images, which can potentially
Accepted 16 March 2021
have a substantial role in streamlining and accelerating the diagnosis of COVID-19. Such techniques
Available online 20 March 2021
involve large datasets for training and all such data must be centralized in order to be processed.
Keywords: Due to medical data privacy regulations, it is often not possible to collect and share patient data
Federated learning in a centralized data server. In this work, we present a collaborative federated learning framework
Decentralized training allowing multiple medical institutions screening COVID-19 from Chest X-ray images using deep
COVID-19 screening learning without sharing patient data. We investigate several key properties and specificities of
X-ray images
federated learning setting including the not independent and identically distributed (non-IID) and
Deep learning
unbalanced data distributions that naturally arise. We experimentally demonstrate that the proposed
CNN
federated learning framework provides competitive results to that of models trained by sharing data,
considering two different model architectures. These findings would encourage medical institutions
to adopt collaborative process and reap benefits of the rich private data in order to rapidly build a
powerful model for COVID-19 screening.
© 2021 Elsevier B.V. All rights reserved.

1. Introduction new obligations to service operators with regard to data man-


agement, in particular making their centralization much more
When we talk about machine learning and privacy, there regulated. The respect of privacy is more than ever an important
is a sense of conflict. Indeed, machine learning generally and issue at the heart of data processing. These challenges create
deep learning models specially, need to have access to very a problem for data scientists building and deploying machine-
large dataset to achieve good performance. Unfortunately, this learning-based healthcare systems as a service. In short, in order
data is often stored in several organizations because of pri- to benefit from these powerful diagnostics, you have to share
vacy concerns and liability risks. Especially in healthcare domain, your data.
most data is hard to obtain due to legal, privacy, technical, Federated learning (FL), introduced by Google in 2017 [1],
and data-ownership challenges. International regulations such as is a distributed machine learning approach that enables multi-
the Health Insurance Portability and Accountability Act in USA institutional collaboration on deep learning projects without
(HIPAA) and the General Data Protection Regulation in European sharing client data. A motivating example for FL arises when we
Union (GDPR)1 completely redefine the data management policy. keep the training data on local device’s users (nodes) rather than
There is no longer any question of massively collecting client’s logging it to a data center. These nodes perform computations
data without a specific service objective. The GDPR sets the legal from their own data in order to update a global model.
framework for the protection of personal data within the Euro- Because each node generates its data with different patterns,
pean Union. Making companies more responsible, the GDPR gives the distribution of data within each node differs from node to
node. For example, one client may have much more data than
∗ Corresponding author at: Digital Research Center of Sfax, B.P. 275, Sakiet others. So it is impossible to define a representative sample of
the overall distribution. Here, we talk about two of key prop-
Ezzit, 3021 Sfax, Tunisia.
E-mail addresses: [email protected] (S. Ammar),
erties that differentiate federated optimization from a typical
[email protected] (Y. Kessentini), [email protected] distributed optimization problem: (1) Not independent and iden-
(K. Muhammad). tically distributed (Non-IID) data: since each particular user has
1 https://gdpr-info.eu/issues/data-protection-officer/. his local training data, so there is no single representation of

https://doi.org/10.1016/j.asoc.2021.107330
1568-4946/© 2021 Elsevier B.V. All rights reserved.
I. Feki, S. Ammar, Y. Kessentini et al. Applied Soft Computing 106 (2021) 107330

population distribution. (2) Unbalanced data: similarly, each user have been proposed for COVID-19 detection. Hemdan et al. [15]
has a quantity of data that differs from others. and Wang and Wong [16] used deep learning models to diagnose
In view of the federated learning advantages, we have ex- COVID-19 from Chest X-ray images. Nour and Cömert [17] used
ploited this technique in order to deal with a very sensitive a CNN model to extract deep discriminative features from X-ray
topic in the healthcare field. Indeed, since last December 2019, images and used them to feed three machine learning algorithms,
a new coronavirus infection disease (named COVID-19) was first which were k-nearest neighbor, support vector machine, and
reported in Wuhan in China. Subsequently, the outbreak began
decision tree. Gupta et al. [18] proposed an integrated stacked
to spread widely in China and most countries in the world [2].
deep convolutional network to detect COVID-19 and pneumonia
The rapid escalation of this pandemic (with hundreds of deaths
by identifying the abnormalities in Chest X-ray images. Zhang
and thousands of infections) is presenting great challenges for
stopping the virus. et al. [19] developed a deep learning-based model that can detect
Currently, more than one diagnostic method is possible for the COVID-19 based on chest X-ray images with sufficiently high
detection of coronavirus but Chest X-ray images and CT scans sensitivity, enabling fast and reliable screening. In [20], the au-
are from most accepted standard diagnostic [3–5]. Indeed, since thors introduced a deep model for early detection of COVID-19
COVID-19 attacks the epithelial cells that line our respiratory cases using X-ray images that can achieve good accuracy rates
tract, we can use Chest X-ray images to analyze the health of for binary and multi-classes. Narin et al. [21] and Chowdhury
a patient’s lungs, and given that nearly all hospitals have X-ray et al. [22] trained and compared multiple pre-trained CNN based
imaging machines, it could be possible to use X-ray images to test models for the detection of COVID-19 infected patients using
for COVID-19 without the dedicated test kits. Compared to these chest X-ray images. Recently, Demir [23] proposed a deep Long
tests, chest X-ray images analyzed with Artificial Intelligence short-term memory (LSTM) architecture learned from scratch to
offer a fast and cost-effective way to COVID-19 screening. automatically identify COVID-19 cases from X-ray images. Other
Therefore, many research works have been devoted to the
works [24–26] focused on detecting COVID-19 positive cases from
COVID-19 outbreak prediction [6,7] and diagnosis [8,9] based on
chest CT scans using CNN based models.
machine Learning techniques.
A drawback of these centralized models is that, in practical
In the present work, we have concentrated our efforts to
develop and validate a system based on federated learning for de- cases, medical organizations do not agree to devote their doctor–
tection of COVID-19 from Chest X-ray images, which is the root of patient confidentiality by giving out the medical images, like
all the novelties of the article. To the best of our knowledge, this X-ray images, for training purposes. In contrast, many research in
is the first study that addresses the problem of federated learning healthcare [27–30] demonstrated that the technique of federated
on X-ray images for COVID-19 detection. The main contributions learning is a good way to connect all the medical institutions
of this paper are: and make them share their experiences with privacy guarantee.
In this case, the performance of machine learning model will be
• We propose a decentralized and collaborative framework
significantly improved by the formed large medical dataset. As an
that allows clinicians to reap benefits of the rich private data
example, Lee et al. [30] presented a privacy-preserving platform
share while conserving privacy.
• We demonstrate, that despite the decentralized data, the in a federated setting for patient similarity learning across insti-
non-IID and unbalanced properties of the data distribution, tutions. Their model can find similar patients from one hospital
the proposed federated learning framework remains robust to another without sharing patient-level information. Similarly,
and shows competitive results compared to a centralized Huang et al. [29] sought to tackle the challenge of non-IID ICU pa-
learning process. tient data that complicated decentralized learning, by clustering
• We conducted extensive experiments and comparisons with patients into clinically meaningful communities and optimizing
different variations to show the interest and significance of performance of predicting mortality and ICU stay time. More
the proposed strategy which can be particularly useful in recently, Baheti et al. [27] used the concept of federated learning
situations like COVID-19. for detection of pulmonary lung nodules with CT scans.
The remaining paper is organized as follows: Section 2 cites As COVID-19 is a recent emerging infectious disease, there is
the related works. Section 3 describes an overview of our pro- no publicly available large datasets. Most of the existing data is
posed framework of federated optimization procedure adapted stored privately because of concerns over privacy. So, we propose
to a detection problem of COVID-19 disease in X-ray images. in this paper to develop a collaborative framework to avoid com-
Section 4 is dedicated to the experiments and results, where both promising patient privacy while promoting scientific research on
the centralized and federated ways used to train our COVID-19 large datasets to improve patient care. The goal of our work is to
dataset are introduced and their results are discussed. Finally, we promote screening COVID-19 from Chest X-ray images using the
conclude this study in Section 5. federated learning. We demonstrate that a decentralized learning
may address the demands for data protection without impacting
2. Related work the performance compared to a data-centralized learning.

Recently, a lot of work has been done to develop algorithms


of deep learning for the detection of such a disease from chest 3. Proposed framework
X-ray images [10,11]. Indeed, the work in [12] developed an
algorithm that can detect pneumonia from chest X-ray images at
We depict in this section the details of our proposed method
a level exceeding practicing radiologists with a Dense Convolu-
tional Network. Xu et al. [13] used an hierarchical Convolutional for Chest X-ray images classification to identify COVID-19 from
Neural Network (CNN) to classify X-ray images into normal and non-COVID-19 cases. This section first presents the preliminar-
abnormal categories. A descriptive study [14] of radiology images ies of the federated learning context, then an overview of our
obtained from COVID-19 cases demonstrated that these images proposed framework, followed by the architecture of our training
contain useful information for diagnostics and early recognition model, and finally a description of the client-side model training
of this disease. As consequence, many works on radiology images procedure and the server-side model aggregation procedure.
2
I. Feki, S. Ammar, Y. Kessentini et al. Applied Soft Computing 106 (2021) 107330

3.1. Preliminaries Step 4. Finally, the server receives updates from all participat-
ing clients and computes an average model w t according to Eq. (2)
We consider the standard machine learning problem objective to update the global model g parameters.
function fi (w ) = ℓ(xi , yi , w ), that is the loss of prediction on K
nk
example (xi , yi ) when using a model described by a vector param-

wt ← wkt (2)
eter w . In a federated setting, we assume that the data points i are k=1
n
partitioned across K clients, Pk is the set of data points on client
Here w t are parameters updated at round t, wkt are parameters
k, and nk = |Pk | designs the number of the client data points.
sent by client k at round t, nk is number of data points stored
Thus, the optimization objective is:
on client k, and n is total number of data points participated in
K collaboratively training.
∑ nk
min f (w ) where f (w ) = Fk (w ) These four steps constitute one round of FL of our CNN model.
w∈Rd n
k=1 This operation is then repeated many times (rounds). We notice
(1)
nk that at each new round t, the server re-sends the new parameters
1 ∑
with Fk (w ) = fi (w ) wt −1 of the global model g built in the previous round t − 1. We
nk also notice that the subset of clients can be changed from one
i=1
round to another if many clients are available. The client selection
McMahan et al. [31] introduced an algorithm for federated
protocol is given in Section 3.2.2.
learning: FederatedAveraging or FedAvg which aims at minimiz-
ing the objective function in Eq. (1) assuming a synchronous
3.2.1. Model architecture
update scheme and a generic non-convex neural network loss
We propose in this study a decentralized and collaborative
function. In terms of convergence, FedAvg is practically equivalent framework for the screening of COVID-19 from chest X-ray im-
to a central model when IID data is used. McMahan et al. [1] ages. Our aim is to demonstrate that the federated learning of
demonstrated that FedAvg is still robust for some examples of a deep CNN model allows to reap benefits of the rich private
non-IID data. However, Zhao et al. [32] showed that the accuracy data sharing while conserving privacy. For this reason, the choice
of FedAvg is significantly reduced when trained on highly skewed of the CNN architecture is not our main concern, and there are
non-IID data even under convex optimization setting. several architectural choices that can slightly increase or decrease
the overall performance. For simplicity we adopt two well-known
3.2. Our framework overview CNN architectures in image classification, namely VGG16 [33] and
ResNet50 [34] as backbone network. For both architectures, we
In this work, we propose to study a federated learning frame- use the pre-trained CNN leaving off the fully connected layer
work based on a client–server architecture (illustrated in Fig. 1) head. Then, we add a classification head composed of global
implementing the FedAvg algorithm in order to classify X-ray average pooling, a fully connected layer of 64 and 256 units with
images into COVID-19 infected cases and non-COVID-19 ones. dropout for VGG16 and ResNet50, respectively, and a final fully
connected layer composed of two units with softmax activation
In this configuration, a centralized parameter server maintains a
for classification. To optimize the classification head, we use the
global model that shares with clients and then coordinates their
categorical cross-entropy loss. Our CNN takes as input an X-
updates. Clients coordinate to build a powerful model based on
ray image of size 224 × 224, and outputs 2 probability values
their own private datasets.
belonging to our 2 classes.
We propose to build a deep convolutional neural network
As shown in Fig. 1, we have two parties exchanging informa-
(CNN) to deal with the feature extraction and the classification
tion: federated clients and a central server. We provide in the
of X-ray images to detect the COVID-19 disease. This model takes
following sections the details of these two parties.
as input an X-ray image and outputs the probability of COVID-
19 infection. The details of this model architecture (CNN) are 3.2.2. Client-side model update
described in Section 3.2.1. The training is performed on the client-side, indeed, each
The learning phase of this CNN model consists of several federated client has a fixed dataset and computational capabil-
communication rounds where the central server interacts syn- ities to run mini-batch SGD. We dispose of 4 clients having all
chronously with the clients. Before starting the training rounds, the same CNN architecture (described in Section 3.2.1) and loss
the CNN model is first initialized with random weights w 0 . We functions. The proposed training algorithm is listed in Algorithm
suppose that there are K available clients having each nk private 1. At round t, each local model is initialized by a global model
X-ray images stored locally. Each communication round t consists wt coming from the server. After running a number of iterations
of four steps: of SGD as many times as number of local epochs, the client
Step 1. Initially the central server maintains a global central computes a gradient update in order to generate the new updated
model g, with initial weights w t −1 , which is shared with a subset model which is shared with the aggregation server. Following this
of clients (hospitals in our case) St that are randomly selected training protocol, local data remains private to each client and is
given a fraction C , with C ∈ [0, 1]. never shared.
Step 2. Each client k ∈ St , receiving initial parameters w t −1 ,
performs training steps on a mini-batch b of its local own private 3.2.3. Server-side model aggregation
data based on the minimization of the local objective Fk using The server that owns the global model, manages the overall
mini batch stochastic gradient descent (SGD) with a local learn- progress of the model training and distributes the original model
ing rate ηlocal and for a number of epochs E. Clients optimize to all participating clients. It receives synchronized updates from
the model via minimizing the categorical cross entropy loss for all participating clients at each federated round t (see Algorithm
classification. 2) and aggregates them to build a new model with updated
Step 3. If local training is finished (running SGD for E epochs parameters according to Eq. (2). Algorithm 2 presents the details
on local data points), users from St send back to the server their of the server side learning process.
model updates wkt , k ∈ St .
3
I. Feki, S. Ammar, Y. Kessentini et al. Applied Soft Computing 106 (2021) 107330

Fig. 1. Federated Learning architecture for COVID-19 detection from Chest X-ray images.

Algorithm 1 Federated learning: client-side training at federated Algorithm 2 Federated learning: server-side aggregation proce-
round t. dure.
Require: local learning rate η and loss function ℓ Require: T : num_fedetared_rounds
Require: num_local_epochs and local training data 1: procedure Aggregating(C , K )
1: procedure ClientUpdate(w ) Initialize global model w 0
t
2:
2: w ← wt ▷ Initialize local model 3: for each round t = 1,2,... T do
3: B ← Split Pk into batches of size B 4: m ← max(C × K , 1)
4: for each local epoch i from 1 to E do ▷ With SGD 5: St ← (random set of m clients) ▷ Selected Clients for
optimizer round t
5: for each batch b in B do 6: for each client k ∈ St do ▷ Run in parallel
6: Compute gradient gib ← ∇ℓ(w; b) 7: Send w t −1 to client k
7: Update local model w ← w − ηgib 8: wkt ← ClientUpdate (k, wt −1 )
8: end for 9: end for∑
9: end for 10: wt ← Kk=1 nnk wkt ▷ Aggregating clients updates
10: return w ▷ Upload to server 11: end for
11: end procedure 12: return w T
13: end procedure

4. Experiments
which contains normal and abnormal chest X-ray images. Fig. 2
In this work, we simulate experiments with 4 clients (hos- (left) shows sample images belonging to the two classes.
pitals), and each client treats the full local dataset as a single We randomly split the dataset into a training set containing
mini-batch at each round. 80% of the images (76 COVID-19 images belonging to 55 patients
and 76 healthy patient images) and a test set containing 20% of
4.1. Data preparation the images (32 COVID-19 images belonging to 21 patients and 32
healthy patient images). The training dataset is then split into K
Since there are no available large public datasets belonging to sub-sets according to the appropriate testing data distribution. All
COVID-19 cases, the dataset used for this work only includes 108 our simulations are done using K = 4 clients.
chest X-ray images belonging to 76 patients, all of which were When we deal with IID data, we assign 38 images (19 COVID-
19 cases and 19 Normal cases) for each client. All clients have the
confirmed with COVID-19, and 108 chest X-ray images diagnosed
same amount of data (25%) according to the same distribution.
as normal (not COVID-19) belonging to healthy patients. The
In order to simulate non-IID training on our dataset, we use a
COVID-19 X-ray images used for this research are available at
skewed class distribution and we divide the learning data so that
the Github repository2 while 108 X-ray images of normal cases
each client gets a different number of images from each class
are randomly selected from the public chest X-ray dataset [35],
(44% of images of one class and 6% of images of the second one).
Finally, we generate a third version of our training dataset in
2 https://github.com/ieee8023/covid-chestxray-dataset. order to test unbalanced data distribution over clients. To do this,
4
I. Feki, S. Ammar, Y. Kessentini et al. Applied Soft Computing 106 (2021) 107330

Fig. 2. Sample Chest X-ray images from the used dataset. Left : sample images selected from the original dataset. Right : corresponding augmented images generated
with random zoom and rotation augmentations.

we spread the entire training samples over the 4 clients, so that we repeat experiments 3 times and all the curves represent
each has more observations than others. The four clients have average results obtained over these 3 simulations. Since we have
respectively 44%, 37%, 13%, and 6% of training dataset. binary classification tests, we provide also statistical measures of
All images from the same patient only appear in either training performance that are widely used in medical and epidemiologi-
or testing set. In addition, there is no patient overlap between the cal research [25], namely sensitivity and specificity. Indeed, the
client sub-sets in order to make our federated setup realistic. sensitivity reflects the probability that the screening test will be
Since the dataset is small, we applied data augmentation op- positive among those who are already diseased (True Positives)
erations in order to artificially expand the size of the training and the specificity reflects the probability that the screening test
and test sub-sets by creating modified versions of the images. We will be negative among those who do not have the disease (True
used two geometric transformations, namely rotation and zoom. Negatives) [37].
Rotation augmentations consist of rotating the image right or To show the effectiveness of our federated learning based
left on an axis by a random and small degree (rotation_range = method, we first compare its performance with traditional learn-
10). Zoom augmentations are done by zooming in or out the ing method, where we train the same architecture network on
image according to a small range (zoom_range = 0.1). Fig. 2 shared and centralized data (We refer to this method as
(right) presents some samples of augmented images. By these Centralized-VGG16 and Centralized-ResNet50 respectively when
operations, the number of training samples is augmented from 38 using VGG16 and ResNet50 architectures as the model backbone).
to 152 (76 COVID-19 cases and 76 normal cases) for each client.
This augmented dataset is used only for one experiment, in order
4.3. Results
to demonstrate the dataset size impact on the model quality and
test accuracy. Results are explained in Section 4.3.1.
We conduct this study of federated learning for COVID-19 de-
tection to highlight the effectiveness of this type of decentralized
4.2. Model settings and evaluation metric
and collaborative learning in such context where data is private.
First, we compare our decentralized method with the central-
On both the federated learning and the centralized learning
ized one. Then, we study the effect of the parameter C on the
end, we adopt the same CNN networks, with pre-trained weights
model performance after each round when we deal with IID data
on ImageNet [36], leaving off the fully connected layer head and
replaced by a new classification head, for training and predic- distribution. Finally, we compare the two distribution settings IID
tion. Two CNN architectures are tested, the details of which are and non-IID, balanced and unbalanced.
provided in Section 3.2.1.
The model weights of the CNN backbone (VGG16 and 4.3.1. Federated vs. data-centralized training:
ResNet50) are frozen such that only the new fully connected lay- We have 152 training samples and 64 testing samples in our
ers head will be trained. The standard SGD optimizer is used for dataset. Since there is not a natural user partitioning of this data,
minimizing the loss function and then updating the network pa- we considered the balanced and IID setting. So, we partition the
rameters. We refer to this method as FL-VGG16 and FL-ResNet50 training dataset into 4 clients each containing 38 training (25%).
respectively when using VGG16 and ResNet50 architectures as In this section, federated results are compared with a central-
the model backbone. We set the local learning rate ηlocal = 0.001, ized learning method. Our aim is to evaluate accuracy perfor-
the batch size = 2, and the training epochs = 10. In addition, we mance of our proposed FL based method. Fig. 3 shows compar-
resize each image to a fixed size of 224 × 224 pixels. ative results across data-sharing and FL for our two implemen-
Significant accuracy rate is required in COVID-19 diagnosis and tations (VGG16 and ResNet50) over our original training dataset
detection system to limit the spread of the infection and to guide and the augmented one. The models quality is measured by
the patient treatment. Therefore, to evaluate the performance accuracy scores on a held-out test dataset, plotted against the
of our proposed method, we report accuracy rates on testing number of communication rounds for FL based methods, and
data after each round of federated learning. For each method, against data-sharing epochs for centralized methods.
5
I. Feki, S. Ammar, Y. Kessentini et al. Applied Soft Computing 106 (2021) 107330

Fig. 3. Comparison of Federated Learning to data-sharing learning using original and augmented dataset for learning. Left: results using the VGG16 as the model
backbone. Right: results using the ResNet50 as the model backbone. An epoch for centralized methods is defined as a single training pass over all of the centralized
data. A round for FL methods is defined as a parallel training pass of every client over their local training data.

For FL settings, we fix C = 1 (all clients are considered at Table 1


each round). Fig. 3 shows that the proposed FL procedure can Accuracy, Sensitivity, and Specificity rates after the last FL round/Data sharing
epoch. Reported results are given with respect to our experiments made with
achieve a comparable classification performance without sharing the 5-fold cross-validation method. Accuracy, Sensitivity, and Specificity rates
clients’ data. Fig. 3-left shows that from the round 35, our FL- provided in this table are average results over the 5 simulations.
VGG16 method trained on the original dataset (FL-VGG16: orange Method Accuracy Sensitivity Specificity
curve) approaches the Centralized-VGG16 method trained on the FL-VGG16 93.57 95.03 92.12
same dataset (Centralized-VGG16: red curve) but after collect- FL-VGG16 + data aug 94.40 96.15 92.66
ing and sharing all data from the 4 clients. Fig. 3-right shows Centralized-VGG16 93.75 95.20 92.3
that our FL-ResNet50 (orange curve) method provides similar Centralized-VGG16 + data aug 94.0 95.01 93.0
behavior but requires more rounds (50 rounds) to approach the FL-ResNet50 95.4 96.03 94.78
Centralized-ResNet50 method (red curve). FL-ResNet50 + data aug 97.0 98.11 95.89
Centralized-ResNet50 95.3 96.0 94.6
We notice that the blue and the magenta curves correspond
Centralized-ResNet50 + data aug 96.5 96.8 96.2
to our FL based methods (FL-VGG16 and FL-ResNet50) and the
Centralized-VGG16/ResNet50 methods respectively learned on an
augmented dataset with data augmentation techniques described
above. Our FL results show a remarkable consistency on the results obtained over the 5 simulations for each method. Fig. 4
simulated distributions. Our method (FL-VGG16+data aug : blue confirms all the results presented in Fig. 3 for both VGG16 and
curve) has comparable results with the two centralized methods ResNet50 implementations. This finding confirms the generaliza-
(Centralized-VGG16 and Centralized-VGG16+data aug) after only tion ability of the proposed model and the independence of our
12 rounds. The same result is observed with the FL-ResNet50 reported results of the train/test dataset splits.
method which provides comparable results with the two central- We report in Table 1 the performance of the tested methods
ized methods using the same CNN architecture after 50 rounds. based on accuracy, sensitivity, and specificity measures after the
The amount of data at each client side has then a significant last round for FL based methods and the last epoch for the
impact on the final result. centralized ones. The first result that we can see in Table 1 is
Another important result that can be underlined from Fig. 3- that after 150 rounds, our FL-ResNet50 model provides the higher
left is that after about 90 rounds, all methods (with the VGG16 accuracy performance when it is trained with data augmenta-
CNN) are equivalent and provide similar results. Fig. 3-right tion achieving an accuracy of 97%. This method also provides a
shows the same behavior but from round 120. This result high- high sensitivity rate of 98.11% and a specificity rate of 95.89%.
lights the effectiveness of our proposed FL based framework, since Another result that can be underlined from Table 1 is that all
it allows to achieve similar results to centralized methods by tested methods provide comparable sensitivity and specificity
iterating several rounds without never sharing data that preserve rates when using the same model backbone. Indeed, all VGG16
their privacy. based models provide sensitivity and specificity rates ranging in
Results with 5-fold cross validation. To further evaluate our 95 − 96% and in 92 − 93%, respectively. On the other hand, the
proposed FL based method, we used 5-fold cross-validation ResNet50 based models provide higher sensitivity and specificity
method, which consists of dividing all the available data into a rates ranging in 96 − 98% and in 94 − 96%, respectively. This result
predefined number of folds (5 in our case), and using one fold highlights the effectiveness of our proposed FL based methods,
for testing and the others for training. The training process is since they provide comparable performances to centralized meth-
repeated 5 times until all folds are used as a test set. Using the ods while preserving data privacy, showing their suitability for
cross-validation method is motivated by the fact that we have privacy-restricted applications.
little data and our model will be tested on only few data samples.
So by doing cross-validation, we use all of our data both for 4.3.2. Results on IID data
training and testing while evaluating our model on examples it We consider here the IID and balanced data partition (same
has never seen before. At each iteration, the training and test sets as Section 4.3.1) and we provide experiments with the client
are randomly divided into K = 4 sub-sets each for one client fraction C , which controls the amount of multi-client parallelism.
while respecting the protocol described above where all images We notice that C = 1 means that all available clients are selected
from the same patient only appear in either training or test set for collaborative training at each round, and C = 0 means that
and there is no patient overlap between the client sub-sets. only one client is selected at each round. When C = 0, there
We provide in Fig. 4 comparative results across data-sharing is no parallelism between clients, and the learning process is
and FL for our VGG16 and ResNet50 implementations using a considered to be sequential. In our case, C = 0.25 is equivalent
5-fold cross-validation method. All the curves represent average to C = 0 since we have only 4 clients. We report in this section
6
I. Feki, S. Ammar, Y. Kessentini et al. Applied Soft Computing 106 (2021) 107330

Fig. 4. Comparison of Federated Learning to data-sharing learning using original and augmented dataset for learning. Curves represent average results obtained over
the 5 simulations for each method. Left: results using the VGG16 as the model backbone. Right: results using the ResNet50 as the model backbone.

Fig. 5. Effect of the client fraction C on the test accuracy of our proposed method FL-VGG16. Note C = 1 corresponds to all clients are selected at each round (4
clients in our case), C = 0.5 corresponds to half clients (2 clients in our case) and C = 0.25 corresponds to only one client per round. Left: results using the VGG16
as the model backbone. Right: results using the ResNet50 as the model backbone.

results obtained with three values of the parameter C : 1, 0.5, and We show that the speedups with partitioned non-IID data
0.25. (green curve) are smaller but still substantial, this implies that
Fig. 5 shows the test accuracy curves plotted against the the performance of the model is random. We notice that despite
communication rounds up to 100 and 150 for FL-VGG16 and the non-IID aspect of the data distribution, our implementation of
FL-ResNet50, respectively. Fig. 5-left shows that the FL-VGG16 FL based methods on non-IID data has shown their robustness by
converges to close values faster when all clients are considered trying to achieve test-set accuracy of FL methods on IID data (94%
(C = 1 : orange curve) and collaborate at each round. When for FL-VGG16 and 95.3% for FL-ResNet50) which in turn surpassed
only half of clients are selected (C = 0.5 : green curve), the that of centralized learning method. This small degradation of the
results are slightly worse but they approach the case when C = 1 quality of model training is due to the fact that each client has a
after about 45 rounds. When only one client is selected at each lot of data from one class and little data from the other. We also
round (C = 0.25 : cyan curve), the results are fluctuating and the notice that by increasing the number of rounds, for the non-IID
convergence to close values begin only after several rounds. This partition the test accuracy is almost stabilized (0.9 for FL-VGG16
behavior is justified by the fact that only one client is considered and around 0.92 for FL-ReNet50), in contrast for the case of IID
at each round, so the update model process on the server consists data partition, it continues to converge.
of replacing the old model by the one sent by the considered
client and there is no collaborative learning. The quality of the
4.3.4. Results on unbalanced data
new model at each round depends then on the selected client
Generally, the unbalanced and non-IID distribution of such a
data.
dataset is much more representative of the type of data distri-
For FL-ResNet50, Fig. 5-right illustrates the same convergence
bution for medical applications. And since we are manipulating
behavior for all curves but we get slightly worse accuracy rates
a method intended for medical applications, we have adopted
when C = 0.5 and C = 0.25. When using half of clients, the
our implementation to converge in the case of a distribution of
accuracy decreases in regard to use all clients and it decreases
even more when using 25% of clients (only one client) per round. unbalanced data. As shown in Fig. 7-left, despite the significant
In the federated learning context, generally the ratio is set to 10% imbalance in numbers of subjects per client (which are parti-
because it is more realistic in a practical setup where there are tioned as described in Section 4.1), FL-VGG16 on unbalanced data
several available clients [38]. (pink curve) achieves test-set accuracy 92% (approaching even
We can conclude that for the IID data partition, using more those of the centralized learning model). The same behavior is
clients in each round increases the accuracy at convergence and observed with FL-ResNet50 trained on unbalanced data (pink
the learning process requires less rounds to converge. This result curve in Fig. 7-right) achieving a test-set accuracy 92.7%.
is observed in our context where the number of clients is limited By comparing the two curves of FL (VGG16 and ResNet50) on
and the available data size is small. Such results can be specific balanced data and FL (VGG16 and ResNet50) on unbalanced data,
for this context. test accuracy of the first method is higher than test accuracy of
the second one (this is justified by the fact that clients hold very
4.3.3. Results on non-IID data different amounts of data) which tends to approach it after sev-
In this section, we fix C = 1 and compare the two FL methods eral rounds. The method implemented for unbalanced data shows
on IID data and non-IID data and provide results in Fig. 6. its performance in achieving 92% and 92.7% test accuracy for the
7
I. Feki, S. Ammar, Y. Kessentini et al. Applied Soft Computing 106 (2021) 107330

Fig. 6. Comparison of Federated Learning results on IID data and non-IID data partitions with C = 1 (all clients are considered at each round). Left: results using
the VGG16 as the model backbone. Right: results using the ResNet50 as the model backbone.

Fig. 7. Comparison of Federated Learning results on balanced data and unbalanced data partitions with C = 1 (all clients are considered at each round). Left: results
using the VGG16 as the model backbone. Right: results using the ResNet50 as the model backbone.

VGG16 and the ResNet50, respectively. We can then conclude that Declaration of competing interest
the heterogeneity of the quantity of data held by each client does
not affect the model’s performance. The authors declare that they have no known competing finan-
cial interests or personal relationships that could have appeared
5. Conclusion and future work to influence the work reported in this paper.

References
In this paper, we presented a Federated Learning framework
for COVID-19 detection from Chest X-ray images using deep con- [1] H.B. McMahan, E. Moore, D. Ramage, S. Hampson, B.A. . Arcas,
volutional neural networks (VGG16 and ResNet50). This frame- Communication-efficient learning of deep networks from decentralized
work operates in a decentralized and collaborative manner and data, in: AISTATS, 2017, p. 54.
[2] D.S. Hui, E.I. Azhar, T.A. Madani, F. Ntoumi, R. Kock, O. Dar, G. Ippolito,
allows clinicians everywhere in the world to reap benefits of the T.D. Mchugh, Z.A. Memish, C. Drosten, A. Zumla, E. Petersen, The continuing
rich private medical data sharing while conserving privacy. We covid-19 epidemic threat of novel coronaviruses to global health-the latest
first presented a comparative study between two medical image 2019 novel coronavirus outbreak in wuhan, china, Int. J. Infec. Dis.: IJID:
Off. Publ. Int. Soc. Infect. Dis. 91 (2020) 264–266.
machine learning scenarios: the classical centralized learning and
[3] J.P. Kanne, B.P. Little, J.H. Chung, B.M. Elicker, L.H. Ketai, Essentials for
the federated learning, using two CNN architectures as model radiologists on covid-19: an update—radiology scientific expert panel,
backbone: VGG16 and ResNet50. We then demonstrated that 2020.
federated learning can achieve the same performance as central- [4] X. Xie, Z. Zhong, W. Zhao, C. Zheng, F. Wang, J. Liu, Chest ct for typical
2019-ncov pneumonia: relationship to negative rt-pcr testing, Radiology
ized learning, but without the obligation to share or centralize
(2020) 200343.
private and sensitive data. We also demonstrated that despite [5] Z.Y. Zu, M.D. Jiang, P.P. Xu, W. Chen, Q.Q. Ni, G.M. Lu, L.J. Zhang,
the decentralized data, the non-IID and unbalanced properties Coronavirus disease 2019 (covid-19): a perspective from china, Radiology
of the data distribution, the proposed Federated Learning frame- (2020) 200490.
[6] P. Kairon, S. Bhattacharyya, COVID-19 outbreak prediction using quantum
work remains robust and shows comparable performance with a neural networks, Intel. Enabled Res. 11 (2021) 3–123.
centralized learning process. We note that the federated learning [7] S.F. Ardabili, A. Mosavi, P. Ghamisi, F. Ferdinand, A.R. Varkonyi-Koczy,
framework is validated on COVID-19 screening from Chest X- U. Reuter, T. Rabczuk, P.M. Atkinson, Covid-19 outbreak prediction with
ray images, but could be generalized to other medical imaging machine learning, Algorithms 13 (2020).
[8] P. Schwab, A. DuMon Schütte, B. Dietz, S. Bauer, Clinical predictive models
applications with large, distributed, and privacy-sensitive data. for covid-19: Systematic study, J. Med. Internet. Res. 22 (2020) e21439.
Federated learning has the potential to connect all the isolated [9] W.T. Li, J. Ma, N. Shende, G. Castaneda, J. Chakladar, J.C. Tsai, L. Apostol, C.O.
medical institutions, hospitals or devices to make them share Honda, J. Xu, L.M. Wong, T. Zhang, A. Lee, A. Gnanasekar, T.K. Honda, S.Z.
Kuo, M.A. Yu, E.Y. Chang, M. Rajasekaran, W.M. Ongkeko, Using machine
their experiences and collaborate with privacy guarantee. Such
learning of clinical data to diagnose covid-19: a systematic review and
collaboration will improve the speed and accuracy in the COVID- meta-analysis, BMC Med. Inform. Decis. Mak. 1 (2020b).
19 positive cases detection. We aim in the future to provide such [10] H. Ma, I. Smal, J. Daemen, T. van Walsum, Dynamic coronary roadmapping
a federated platform, where all the hospitals can safely share data via catheter tip tracking in x-ray fluoroscopy with deep learning based
bayesian filtering, Med. Image Anal. 61 (2020) 101634.
and train models by exploring the differential privacy technique.
[11] Y. Zhang, S. Miao, T. Mansi, R. Liao, Unsupervised x-ray image segmenta-
Another interesting direction for future work is to consider a tion with task driven generative adversarial networks, Med. Image Anal.
more sophisticated CNN using very large-scale datasets. 62 (2020b) 101664.

8
I. Feki, S. Ammar, Y. Kessentini et al. Applied Soft Computing 106 (2021) 107330

[12] P. Rajpurkar, J. Irvin, K. Zhu, B. Yang, H. Mehta, T. Duan, D. Ding, A. Bagul, R. [26] L. Li, L. Qin, Z. Xu, Y. Yin, X. Wang, B. Kong, J. Bai, Y. Lu, Z. Fang, Q. Song,
Ball, C. Langlotz, K. Shpanskaya, M. Lungren, A. Ng, Chexnet : Radiologist- K. Cao, D. Liu, G. Wang, Q. Xu, X. Fang, S. Zhang, J. Xia, J. Xia, Artificial
level pneumonia detection on chest x-rays with deep learning, 2017, arXiv intelligence distinguishes covid-19 from community acquired pneumonia
preprint arXiv:1711.05225v3. on chest ct, Radiology (2020a).
[13] S. Xu, H. Wu, R. Bie, Cxnet-m1: Anomaly detection on chest x-rays with [27] P. Baheti, M. Sikka, K.V. Arya, R. Rajesh, Federated learning on distributed
image-based deep learning, IEEE Access 7 (2019) 4466–4477. medical records for detection of lung nodules, in: Proceedings of the 15th
[14] H. Shi, X. Han, N. Jiang, Y. Cao, O. Alwalid, J. Gu, Y. Fan, C. Zheng, International Joint Conference on Computer Vision, Imaging and Computer
Radiological findings from 81 patients with covid-19 pneumonia in wuhan, Graphics Theory and Applications, 2020, pp. 445–451.
china: a descriptive study, Lancet. Infect. Dis. 2020 20 (2020) 425–434. [28] T.S. Brisimi, R. Chen, T. Mela, A. Olshevsky, I.C. Paschalidis, W. Shi,
[15] E.E.D. Hemdan, M.A. Shouman, M.E. Karar, Covidx-net: A framework of Federated learning of predictive models from federated electronic health
deep learning classifiers to diagnose covid-19 in x-ray images, 2020, arXiv records, Int. J. Med. Inform. 112 (2018) 59–67.
preprint arXiv:2003.11055. [29] L. Huang, A.L. Shea, H. Qian, A. Masurkar, H. Deng, D. Liu, Patient clustering
[16] L. Wang, A. Wong, Covid-net: A tailored deep convolutional neural network improves efficiency of federated machine learning to predict mortality and
design for detection of covid-19 cases from chest x-ray images, 2020, arXiv hospital stay time using distributed electronic medical records, J. Biomed.
preprint arXiv:2003.09871. Inform. 99 (2019) 103291.
[17] M. Nour, K. Cömert, A novel medical diagnosis model for covid-19 infection [30] J. Lee, J. Sun, F. Wang, S. Wang, C.H. Jun, X. Jiang, Privacy-preserving patient
detection based on deep features and bayesian optimization, Appl. Soft similarity learning in a federated environment: Development and analysis,
Comput. 97 (2020) 106580. JMIR Med. Inform. 6 (2018) e20.
[18] A. Gupta, Anjum, S. Gupta, R. Kataryat, Instacovnet-19: A deep learning [31] H.B. McMahan, E. Moore, D. Ramage, B.A. Arcas, Federated learning of deep
classification model for the detection of covid-19 patients using chest networks using model averaging, 2016, arXiv preprint arXiv:1602.05629.
x-ray, Appl. Soft Comput. 97 (2020). [32] Y. Zhao, L. Lai, N.S.D. Civin, M. Li, V. Chandra, Federated learning with
[19] J. Zhang, Y. Xie, Y. Li3, C. Shen, Y. Xi, Covid-19 screening on chest x-ray non-iid data, 2018, arXiv preprint arXiv:1806.00582v1.
images using deep learning based anomaly detection, 2020a, arXiv preprint [33] K. Simonyan, A. Zisserman, Very deep convolutional networks for
arXiv:2003.12338v1. large-scale image recognition, 2014, arXiv preprint arXiv:1409.1556.
[20] T. Ozturk, M. Talo, E.A. Yildirim, U.B. Baloglu, O. Yildirim, U.R. Acharya, [34] K. He, X. Zhang, S. Ren, J. Sun, IEEE Conference on Computer Vision and
Automated detection of covid-19 cases using deep neural networks with Pattern Recognition (CVPR), Deep residual learning for image recognition
x-ray images, Comput. Biol. Med. (2020) 103792. (2016) 770–778.
[21] A. Narin, C. Kaya, Z. Pamuk, Automatic detection of coronavirus disease [35] S. Jaeger, S. Candemir, S. Antani, Y.X.J. Wáng, P.X. Lu, G. Thoma, Two public
(covid-19) using x-ray images and deep convolutional neural networks, chest x-ray datasets for computer-aided screening of pulmonary diseases,
2020, arXiv preprint arXiv:2003.1084. Quant. Imaging Med. Surg. 47 (2014) 5–477.
[22] M.E.H. Chowdhury, T. Rahman, A. Khandakar, R. Mazhar, M.A. Kadir, [36] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang,
Z.B. Mahbub, K.R. Islam, M.S. Khan, A. Iqbal, N.A. Emadi, et al., Can ai A. Karpathy, A. Khosla, M. Bernstein, et al., Imagenet large scale visual
help in screening viral and covid-19 pneumonia? IEEE Access 8 (2020) recognition challenge, Int. J. Comput. Vis. 115 (2015) 211–252.
132665–132676. [37] Devashish, U. Sharma, P. Yadav, Sharma, The concept of sensitivity and
[23] F. Demir, Deepcoronet: A deep lstm approach for automated detection of specificity in relation to two types of errors and its application in medical
covid-19 cases from chest x-ray images, Appl. Soft Comput. 103 (2021). reasearch, J. Reliab. Stat. Stud. 2 (2009) 53–58.
[24] F. Shan, Y. Gao, J. Wang, W. Shi, N. Shi, M. Han, Z. Xue, D. Shen, Y. Shi, [38] D. Leroy, A. Coucke, T. Lavril, T. Gisselbrecht, J. Dureau, Federated learning
Abnormal lung quantification in chest ct images of covid-19 patients with for keyword spotting, in: ICASSP 2019-2019 IEEE International Conference
deep learning and its application to severity prediction, Med. Phys. (2020). on Acoustics, Speech and Signal Processing (ICASSP), IEEE., 2019, pp.
[25] O. Gozes, M. Frid-Adar, H. Greenspan, P.D. Browning, H. Zhang, W. Ji, A. 6341–6345.
Bernheim, E. Siegel, Rapid ai development cycle for the coronavirus (covid-
19) pandemic: Initial results for automated detection & patient monitoring
using deep learning ct image analysis, 2020, 2003.05037.

You might also like