Abstract
Automatic segmentation of infection areas in computed tomography (CT) images has proven to be an effective diagnostic approach for COVID-19. However, due to the limited number of pixel-level annotated medical images, accurate segmentation remains a major challenge. In this paper, we propose an unsupervised domain adaptation based segmentation network to improve the segmentation performance of the infection areas in COVID-19 CT images. In particular, we propose to utilize the synthetic data and limited unlabeled real COVID-19 CT images to jointly train the segmentation network. Furthermore, we develop a novel domain adaptation module, which is used to align the two domains and effectively improve the segmentation network’s generalization capability to the real domain. Besides, we propose an unsupervised adversarial training scheme, which encourages the segmentation network to learn the domain-invariant feature, so that the robust feature can be used for segmentation. Experimental results demonstrate that our method can achieve state-of-the-art segmentation performance on COVID-19 CT images.
Keywords: COVID-19, Automatic segmentation, Computed tomography, Domain adaptation, Adversarial training
Introduction
The novel coronavirus disease (COVID-19) has become one of the most serious global pandemics. COVID-19 is caused by the infection of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which can be transmitted via breathing, coughing, sneezing, or other means. A recent report [1] showed that, by March 2021, more than 120 million people around the world would have been infected with COVID-19, with a fatality rate of over 2%.
To diagnose COVID-19, the real-time reverse transcription polymerase chain reaction (RT–PCR) test is routinely used. However, RT-PCR is time-consuming, and a series of tests may be required to exclude the possibility of false negatives, which means that there is an urgent need for alternative methods for the fast and accurate diagnosis of COVID-19. Chest computed tomography (CT) has been strongly recommended for the early recognition and evaluation of suspected SARS-CoV-2 infection [2]. Chest CT scans are very useful for the auxiliary diagnosis of the typical radiographic features of COVID-19, including ground-glass opacity and consolidation [3]. Therefore, the qualitative assessment of infection in CT scans could provide important information in the fight against the spread of COVID-19. Image segmentation has proven to be effective in COVID-19 CT image analysis [4–6], but it remains challenging because (1) the diversity in the size and distribution of infection leads to a large number of false negative segmentation results, and (2) ground-glass opacity and consolidation are similar in appearance, this small inter-class difference makes the segmentation more difficult [7].
Deep learning based automatic segmentation is a powerful technique for medical imaging analysis [8]. The excellent performance can be attributed to the availability of large volumes of labeled training data. However, it is time-consuming and laborious to collect a sufficient number of COVID-19 CT images with annotations due to concerns over patient privacy [9, 10] and lack of experts [11]. To tackle this issue, some methods have employed parameterized transformation to augment the limited annotated COVID-19 CT images for supervised learning [12–19]. Despite parameterized transformation can solve the problem of data shortage to some extent, but networks trained on limited data still suffer from the poor generalization to unseen datasets due to the insufficient data diversity. Besides, several works have explored to construct new networks suitable for small-scale labeled COVID-19 data [20–22], of which the high performance relies on the carefully designed network structure, thus losing scalability and flexibility.
More recently, some efforts have been devoted to generating synthetic COVID-19 CT data for promoting computer-aided diagnosis ability of COVID-19 [23–25], which made it possible to train deep models on synthetic images and computer-generated annotations. Nevertheless, as the work [26] shows, a model directly trained with the synthetic data may fail to produce precise results for real COVID-19 CT images due to the domain shift. In view of the fact that (1) existing supervised and semi-supervised methods are limited by small-scale COVID-19 CT data; (2) the synthetic COVID-19 CT data is not available directly for training due to domain shift problem. A natural and practical question comes up: how to properly utilize the potential of synthetic data to improve the segmentation performance on COVID-19 CT images?
To address above issues, we propose a novel unsupervised domain adaptation based segmentation network for COVID-19 CT infection segmentation task. The contributions of this paper can be summarized as follows: (1) we propose to make full use of synthetic data and limited unlabeled real COVID-19 CT images to jointly train the segmentation network, so as to introduce richer diversity; (2) we design a domain adaptation module to align the two domains and overcome the domain shift. It effectively improves the generalization capability of segmentation network; (3) we propose an unsupervised adversarial training scheme, in which the cross-domain adversarial loss will guide the segmentation network to learn domain-invariant feature, thus improving the segmentation performance. In the meanwhile, our training scheme is very flexible, as it can be arbitrarily combined with any segmentation network with encoder-decoder structure.
The remainder of this paper is organized as follows. In Section 2, we review previous related works. Section 3 discusses the main components and training scheme of our proposed method, while Section 4 describes our experiments on real COVID-19 CT images. Finally, Section 5 concludes the paper.
Related works
COVID-19 infection segmentation
Medical imaging such as CT has played an important role in the fight against COVID-19. As an essential step in the processing and assessment of CT images, segmentation can identify the regions of interest, such as ground-glass opacity, consolidation, and the lung [8]. Recently, deep learning based segmentation methods have been utilized in COVID-19 CT diagnosis [12–22]. For instance, Ouyang et al. [18] developed a 3D CNN network for COVID-19 infection segmentation, and proposed a dual-sampling attention mechanism to alleviate the imbalanced problem of data. Oulefki et al. [19] presented a multilevel thresholding procedure based on Kapur entropy to improve the COVID-19 segmentation performance. Fan et al. [20] presented a semi-supervised segmentation method based on random selection propagation strategy, which requires only a few labeled images and primarily utilizes unlabeled data. Qiu et al. [21] proposed a lightweight network to solve the overfitting problem caused by the limited training data for COVID-19 segmentation. Laradji et al. [22] proposed a new COVID-19 segmentation model using point-level rather than full image-level annotations, which overcame the labeling issue to some extent. Most previous works are trained with a supervised or semi-supervised manner, thus the performance is limited by the scale of the labeled data. Furthermore, since the infection areas of COVID-19 could be small with large variations of shapes and textures, segmenting the areas of infection is still a challenging task.
Unsupervised medical segmentation
To deal with the lack of annotated data, unsupervised segmentation techniques have attracted growing interest. Most existing proposals employ clustering, which divides a medical image into different groups according to the similarity of the image intensity. For example, Jose et al. [27] proposed a method based on K-means clustering to segment abnormal brain regions. Cheng et al. [28] utilized a clustering algorithm to generate superpixel-based pseudo-labels to provide supervision for the segmentation network. However, these clustering based methods heavily depend on the pixel intensity, which means different areas but with similar intensity are likely to be mistakenly segmented into the same class. This is undesired in COVID-19 infection segmentation, because ground-glass opacity and consolidation may not be distinguished as they have similar appearance. Some studies regard the unsupervised medical segmentation task as an unsupervised deformable registration process [29–31]. Despite the success of these methods, they are insufficient for COVID-19 segmentation since there are large variations of infection on CT images, such as irregular shapes and ambiguous boundaries [32].
Domain adaptation
Domain adaptation aims to reduce the shift between two distributions [33, 34], it has been widely employed in conjunction with the use of synthetic data for real-world tasks [35–39]. There are several different strategies proposed to gain better domain adaptation. Some studies utilize maximum mean discrepancy (MMD) [40] to minimize differences between feature distributions [41–43], but its effect is limited by whether the distributions follows Gaussian distribution. Another strategy is self-training, which utilizes predictions from an ensemble model as pseudo-labels for unlabeled data to train the current model [44–46]. There is increasing interest in the use of adversarial training to achieve domain adaptation [47–50]. This approach reduces the domain shift by forcing the features from different domains to fool the discriminator, thus leading to features from different domains exhibiting a similar distribution. For medical image segmentation, domain adaptation has also demonstrated positive effects [51–55]. For instance, Degel et al. [52] minimized segmentation loss with a domain discriminator to encourage feature domain-invariance across ultrasound datasets for left atrium segmentation. Christian et al. [51] addressed the domain shift by extending the self-ensembling method to MRI image segmentation. Kamnitsas et al. [55] employed adversarial learning and utilized synthetic data and sufficient labeled data for brain lesion segmentation.
The domain adaption technology has achieved some impressive success, especially in the medical imaging field. Therefore, we consider exploiting this novel technology to solve COVID-19 CT infection segmentation task in this paper.
Methodology
Overview of the proposed method
As shown in Fig. 1, our method consists of two parts: the segmentation network composed of a feature extractor f(⋅) and a pixel-wise classifier c(⋅), as well as the domain adaptation module including a generator g(⋅) and a discriminator d(⋅). The source dataset (the synthetic data) and target dataset (the COVID-19 CT images without annotations) are denoted as and , respectively. We first forward the two inputs XS and XT to f(⋅), thus generating feature maps FS and FT. Then, the c(⋅) takes FS as input and produces an image-sized segmentation map , which is used to optimize segmentation network together with YS. To overcome the domain shift, we align the distributions of the source and target data using domain adaptation module in the image space. We utilize g(⋅) to reconstruct the inputs conditioned on the feature maps FS and FT. We then feed the outputs of g(⋅) and XS, XT to the discriminator d(⋅) and classify them as real or fake within- or cross-domain. The gradients of the cross-domain adversarial loss are propagated from d(⋅) to f(⋅), which leads f(⋅) to learn transferable feature representations applicable to both the source and target domains.
Network structure
Feature extractor
We build a feature extractor that follows the typical architecture of convolutional neural network. It is composed of four 3 × 3 convolutional layers, and each is followed with a 2 × 2 max pooling operation. Given a source image XS and a target image XT, the feature extractor shares the weights and produces feature maps FS and FT, as shown in (1),
1 |
where δ ∈ (S, T) denotes whether the term stems from the source domain or target domain. The learned features are then sent to the classifier and generator. The former is used to generate pixel-level segmentation results, while the latter is projected into image space for further domain adaptation.
Pixel-wise classifier
With the learned feature maps, the pixel-wise classifier converts low-resolution, semantically strong features into pixel-wise classification results, i.e., a class label is assigned to each pixel. We build a classifier that contains three upsampling layers, and each layer is followed by a concatenation with the correspondingly cropped feature map from the feature extractor. It takes FS as input and produces segmentation map with the same size as XS, i.e.,
2 |
As discussed later, in order to make our network have the pixel-level discriminative ability, we use the above predicted segmentation map to calculate the segmentation loss in a supervised manner. Because we can only access the annotations of the source data, we feed only the feature maps of the source domain to the classifier to obtain the segmentation map.
Domain adaptation module
Unlike recent adversarial based domain adaptation approaches for segmentation tasks that directly calculate the adversarial loss in feature space. Here, we utilize the generator to project the intermediate feature maps to image space for robust adversarial training. Given the feature maps FS and FT, the generator shares the weights and produces the reconstructions of source image XS and target image XT. The reconstructions GS, GT, and XS, XT are then sent to the discriminator and classified as real or fake. The reconstruction process is formulated as (3),
3 |
Image-space reconstruction is more robust than feature map when applied to the calculation of adversarial loss. This is particularly so for our infection segmentation task, where the differences in the intensity and texture between the source and target images are not that significant. Section 4.4 provides detailed verification of the effectiveness of the image-space training.
The design of our domain adaptation module is inspired by PatchGAN [56]. Our generator consists of four upsampling layers, each layer is composed of a 3 × 3 transposed convolutional layer and two residual blocks [57]. Our discriminator includes two 4 × 4 convolutional layers, and the first layer is followed by nine residual blocks. For each input, the output of domain adaptation module is a probability map, in which the value of each pixel indicates the possibility that each patch in the input is real or fake. Compared with a normal discriminator whose output is only real numbers, our discriminator is more helpful for retaining detailed information.
Training and testing process
Our goal is to train a segmentation network that produces a competitive performance on real COVID-19 CT images even if no annotations are provided. We use the annotated synthetic images as the source and unlabeled real COVID-19 CT images as the target to jointly train the network, and update the parameters using segmentation loss, adversarial loss, and reconstruction loss. The segmentation loss is defined over adequately annotated source domain images, allowing the network to develop pixel-level discriminative ability. The adversarial loss can be divided into within-domain loss and cross-domain loss. The latter is used to guide the update of the feature extractor, thus allowing the feature extractor to identify the necessary features that should be extracted from the target domain. The reconstruction loss is utilized to ensure the fidelity of the reconstructions.
Generator update
The generator takes the learned features FS and FT as input, and reconstruct XS and XT as GS and GT conditioned on these feature maps. Intuitively, if the reconstruction is sufficiently accurate, there should be a low L1 loss between the reconstruction Gδ and input Xδ. We also optimize the generator using adversarial loss, which forces the discriminator to classify GS and GT as real, thus fooling the discriminator. The object function of the generator can be represented by (4),
4 |
where index i indicates the pixel location in the output probability map of discriminator and label map, index j indicates the pixel location in the input and reconstruction.
Discriminator update
Given XS, XT, GS, or GT, the patch discriminator produces a 4-D probability map for each input. We calculate the adversarial loss using the cross-entropy loss between the output probability map and the label map Y(δ, γ), δ ∈ (S, T),γ ∈ (real, fake). Therefore, the optimization process for the discriminator is as follows,
5 |
where Y(δ, γ) is the 64 × 64 label map, in which each value corresponds to the label of each patch, indicating whether each patch of the input image belongs to the category of source-real, source-fake, target-real, or target-fake.
Feature extractor and classifier update
The updating of the feature extractor and classifier is a crucial process in our network for domain adaptation. We optimize these two components with the following combination of loss terms,
6 |
where the first term is the segmentation loss. This is the pixel-wise cross-entropy loss calculated between the segmentation map and the annotation of the source domain. Index c is the number of the categories in the segmentation results (four categories in our work: background, consolidation, ground-glass opacity, and the lung).
Directly minimizing the segmentation loss in (6) leads to good segmentation performance with the source images, but when tested on the target images, the performance will be significantly lower due to the domain shift. To overcome this problem, we introduce cross-domain adversarial loss to our network. Please note that, unlike the updating process for the generator and discriminator shown in (4) and (5), where the adversarial loss is calculated within the source or target domain. Here, the adversarial loss is cross-domain, and the gradients of the cross-domain adversarial loss can lead to a reversed domain classification. We utilize these gradients to update the feature extractor. To be more specific, the cross-domain adversarial losses are used to ensure that, when target features are passed to the generator, source-like images can be reconstructed, when source features are passed to the generator, target-like images can be reconstructed. Through this constraint domain alignment, the learned features from the two domains will exhibit a similar distribution, thus enabling the feature extractor to learn the common representations of the two domains.
In Fig. 2, we illustrate the training process for each module in the network with the direction of the data flow and gradient flow. For each iteration, the randomly sampled Xδ, δ ∈ (S, T) are sent to the network. The generator, discriminator, feature extractor, and classifier are then iteratively updated in turn. Note that, unlike the updating of the generator and discriminator, the adversarial loss used to update the feature extractor and classifier is cross-domain. Except for the segmentation loss, all the other losses are calculated in the source domain and target domain. During the testing process, we only use the trained feature extractor and classifier. The network takes the real CT images of COVID-19 cases as input and generates the predicted segmentation map.
Experiments
Experimental settings
Dataset
The source data comes from our previous work [24], which was designed to generate high-quality and realistic COVID-19 lung CT images for use in deep learning based medical imaging tasks. The dataset contains 10,200 synthetic 2D CT images with corresponding pixel-wise annotations. There are four categories in the annotation map: ground-glass opacity, consolidation, the lung, and background. The first two are the most common characteristics used for COVID-19 diagnosis in lung CT imaging. The target data is taken from the COVID-19 CT segmentation dataset http://medicalsegmentation.com/covid19/ collected by the Italian Society of Medical and International Radiology. It contains 9 CT volumes from confirmed COVID-19 patients, and each volume contains 200 slices. We reformated all 3D volumes into 2D slices with a size of 512 × 512. Small rotations, shearings, gamma transforms, and intensity normalizations are used for data augmentation, and there are a total of 12,000 slices after pre-processing. We employ 70% slices as the unlabeled target data for training, while the remaining 30% slices are used for testing segmentation performance. We follow the patient-level split rule when we separate the target data into training set and test set.
Implementation details
We use PyTorch [58] for implementation. Our network is trained with 100K iterations using Adam optimizer [59]. The hyperparameters are set at α = 0.1 and lr = 1.0 × 10− 5. The batch size is 1, and for every 10K iterations, the lr is reduced by 20%. The network training is accelerated with an NVIDIA RTX 2080Ti and an Intel(R) Core i7-9700K CPU.
Evaluation metrics
For quantitative evaluation, we adopted the three most commonly used evaluation metrics in medical imaging analysis: the dice similarity coefficient (Dice), sensitivity (Sen), and specificity (Spec) [60, 61]. The dice similarity coefficient is an overlap index that indicates the similarity between the prediction and the ground truth. Sensitivity and specificity are two statistical metrics for the performance of binary medical image segmentation tasks. The former measures the percentage of actual positive pixels correctly predicted to be positive, while the latter measures the proportion of actual negative pixels correctly predicted to be negative. These metrics are defined as follows:
7 |
8 |
9 |
where T P, F P, T N, and F N represent the pixel number of true positive, false positive, true negative, and false negative in the prediction respectively.
Quantitative results
Evaluation on two-class segmentation task
In this section, we compare the segmentation performance of our proposal with two state-of-the-art unsupervised medical image segmentation methods: Self-ensembling [51] and SSL [28]. Because these methods are designed for two-class segmentation, we train our proposed approach as a two-class segmentation network, e.g., by taking the ground-glass opacity as the object to segment and other classes as the background.
Table 1 presents the experimental results when taking each category as the segmentation object. The results are reported as the mean ± error interval (calculated based on 95% confidence interval). Our proposed method outperforms the reported methods across most metrics. Compared with the second-best method Self-ensembling [51], the proposed method produces a 6.11% improvement in the dice similarity score for infection. Different with other compared methods, which utilize consistency loss to minimize the discrepancy between predictions in the source and target domain or employ superpixel-based pseudo-labels for supervision, our proposed approach attempts to learn the more discriminative feature representations when dealing with the challenging medical segmentation task.
Table 1.
Methods | Dice (%) ↑ | Sen (%) ↑ | Spe (%) ↑ | Dice (%) ↑ | Sen (%) ↑ | Spe (%) ↑ |
---|---|---|---|---|---|---|
Ground-glass opacity | Consolidation | |||||
Source-only | 80.60 ± 0.48 | 78.86 ± 0.52 | 99.60 ± 0.01 | 61.75 ± 0.50 | 66.73 ± 0.51 | 99.83 ± 0.01 |
Self-ensembling [51] | 82.43 ± 0.36 | 80.18 ± 0.47 | 99.53 ± 0.01 | 65.16 ± 0.78 | 66.58 ± 1.26 | 99.26 ± 0.01 |
SSL [28] | 78.34 ± 0.87 | 71.37 ± 0.53 | 99.47 ± 0.01 | 73.83 ± 0.91 | 81.30 ± 0.82 | 99.44 ± 0.01 |
Ours | 85.34 ± 0.36 | 82.13 ± 0.41 | 99.87 ± 0.01 | 74.67 ± 0.57 | 68.69 ± 0.39 | 99.97 ± 0.01 |
Target-only | 88.73 ± 0.98 | 87.55 ± 1.34 | 99.84 ± 0.02 | 84.58 ± 0.80 | 84.71 ± 0.94 | 99.94 ± 0.01 |
Infection | Lung | |||||
Source-only | 78.82 ± 0.61 | 70.99 ± 0.86 | 99.80 ± 0.01 | 89.60 ± 0.62 | 92.38 ± 0.23 | 97.89 ± 0.15 |
Self-ensembling [51] | 80.43 ± 0.47 | 80.74 ± 0.51 | 99.63 ± 0.01 | 93.53 ± 0.29 | 90.47 ± 0.10 | 99.61 ± 0.01 |
SSL [28] | 79.15 ± 0.51 | 78.77 ± 0.50 | 99.81 ± 0.01 | 94.59 ± 0.19 | 93.47 ± 0.13 | 97.60 ± 0.01 |
Ours | 86.54 ± 0.39 | 85.54 ± 0.43 | 99.80 ± 0.01 | 95.75 ± 0.25 | 93.11 ± 0.26 | 99.74 ± 0.01 |
Target-only | 91.50 ± 0.43 | 92.56 ± 0.52 | 99.81 ± 0.01 | 97.62 ± 0.15 | 97.38 ± 0.16 | 99.69 ± 0.02 |
Infection considers both ground-glass opacity and consolidation. (The highest evaluation score is marked in bold. ↑ indicates that a higher number is better)
Evaluation on multi-class segmentation task
As an assistant diagnostic tool, our model is expected to provide more detailed information about the infected areas and the lung. Therefore, we extend our method to a multi-class segmentation task, and compare it with state-of-the-art domain adaptation based segmentation methods MinEnt [47], AdvEnt [47], and IntraD [49]. It should be noted that the metrics for each category are calculated by taking the other categories as background. More specifically, even though the network is trained for a multi-class segmentation task, we employ two classes (the object and background) when calculating the metrics.
Table 2 shows the quantitative results on real CT images from COVID-19 cases. Our proposed approach outperforms the compared methods across most metrics. Compared with the second-best method AdvEnt [47], our proposal produces a 5.03% improvement in the dice similarity score for infection. When excluding the domain adaptation module of our network, and only using the base feature extractor and classifier trained with the source data (source-only), we observe a significant drop in performance (Dice: for infection), clearly illustrating the effectiveness of our domain adaptation strategy, which employs adversarial training to learn the true features of the infection from real COVID-19 CT images. It can also be observed that, even without access to the ground truth for the real CT images, our proposed method achieves results that are comparable to the target-only method trained with the target data in a supervised manner. Moreover, our proposal achieves the highest performance in lung segmentation. This proves that the proposed method is also suitable for large-area tissues or organ segmentation.
Table 2.
Methods | Dice (%) ↑ | Sen (%) ↑ | Spe (%) ↑ | Dice (%) ↑ | Sen (%) ↑ | Spe (%) ↑ |
---|---|---|---|---|---|---|
Ground-glass opacity | Consolidation | |||||
Source-only | 79.16 ± 0.56 | 73.65 ± 0.41 | 99.81 ± 0.01 | 61.42 ± 0.45 | 57.54 ± 0.67 | 99.82 ± 0.01 |
MinEnt [47] | 79.72 ± 0.42 | 71.83 ± 0.48 | 99.87 ± 0.01 | 75.33 ± 0.41 | 67.23 ± 0.68 | 99.97 ± 0.01 |
AdvEnt [47] | 81.99 ± 0.38 | 76.68 ± 0.45 | 99.83 ± 0.01 | 64.07 ± 0.74 | 54.18 ± 0.98 | 99.95 ± 0.01 |
IntraDA [49] | 79.30 ± 0.34 | 69.17 ± 0.35 | 99.88 ± 0.01 | 62.33 ± 0.88 | 57.80 ± 1.00 | 99.97 ± 0.01 |
Ours | 86.31 ± 0.27 | 85.37 ± 0.26 | 99.81 ± 0.01 | 74.55 ± 0.30 | 67.44 ± 0.32 | 99.95 ± 0.01 |
Target-only | 87.54 ± 0.27 | 86.83 ± 0.34 | 99.82 ± 0.01 | 84.88 ± 0.42 | 82.79 ± 0.62 | 99.96 ± 0.01 |
Infection | Lung | |||||
Source-only | 76.98 ± 0.30 | 70.92 ± 0.47 | 99.66 ± 0.01 | 88.54 ± 0.32 | 93.47 ± 0.16 | 97.41 ± 0.08 |
MinEnt [47] | 80.91 ± 0.27 | 72.61 ± 0.30 | 99.86 ± 0.01 | 95.55 ± 0.01 | 95.62 ± 0.01 | 99.33 ± 0.01 |
AdvEnt [47] | 81.12 ± 0.28 | 74.55 ± 0.35 | 99.82 ± 0.01 | 95.69 ± 0.06 | 95.41 ± 0.05 | 99.41 ± 0.01 |
IntraDA [49] | 77.34 ± 0.32 | 67.76 ± 0.43 | 99.89 ± 0.01 | 95.27 ± 0.07 | 95.01 ± 0.06 | 99.35 ± 0.01 |
Ours | 86.15 ± 0.29 | 84.29 ± 0.31 | 99.81 ± 0.01 | 96.13 ± 0.07 | 94.61 ± 0.09 | 99.67 ± 0.01 |
Target-only | 89.55 ± 0.35 | 88.57 ± 0.29 | 99.82 ± 0.01 | 97.12 ± 0.13 | 97.04 ± 0.18 | 99.59 ± 0.01 |
Infection considers both ground-glass opacity and consolidation. (The highest evaluation score is marked in bold. ↑ means a higher number is better)
Qualitative results
Figure 3 shows the qualitative results for two-class segmentation of real COVID-19 CT images. We train our method as a two-class segmenter for ground-glass opacity, consolidation, infection, and the lung, respectively. It is obvious that the proposed domain adaptation based segmentation network can learn the discriminative features by employing the adversarial training, so as to accurately segment the object areas. The Self-ensembling [51] can handle the large object segmentation such as (a) and (d), but demonstrates a poor performance for the relatively small consolidation shown in (b). SSL [28] relies on the superpixel-based pseudo labels for supervision during training, so it fails to capture the details of ground-glass opacity in (a).
Figure 4 displays the qualitative results for multi-class segmentation of real COVID-19 CT cases. It is obvious that there are a large number of mis-segmented areas in the visualization results of the source-only (baseline) model. This is mainly due to the differences in the texture and intensity between the synthetic data and the real COVID-19 CT images. We observe a significant improvement in performance when introducing cross-domain adversarial learning in our proposed approach, which confirms the importance of adversarial training based domain adaptation. MinEnt [47] attempts to minimize the entropy value of the model output directly to overcome the domain gap. However, compared with our proposed method, MinEnt fails to capture the fine-grained details of the infection in (a) and (b). AdvEnt [47] conducts adversarial training on entropy map and it is quite sensitive to the influence of irrelevant areas, for example, there is obvious noise in the results (d) and (e). IntraDA [49] relies on the pseudo labels for training, thus it fails to separate the ground-glass opacity and consolidation in (a).
Our domain adaptation based segmentation network outperforms the baseline method and other state-of-the-art methods. It produces a performance that is close to the ground truth with fewer mis-segmented infection areas, especially for consolidation, which is relatively small and challenging to segment. The success of the proposed method is attributed to our adversarial training scheme, through which our network can learn the true features of target data under the constraint of cross-domain adversarial loss. This scheme allows our network to more clearly distinguish the real features of ground-glass opacity and consolidation even without access to ground truth annotations of the target data. In addition, our proposed method also performs best in terms of lung segmentation, which proves that our method can be generalized. That is, it can be used not only for COVID-19 infection segmentation, but also for other organs.
Ablation study
In order to assess the important settings and components of our method, we conduct ablation experiments following the multi-class experimental settings in Section 4.2. The evaluation criterion is the dice similarity coefficient.
Comparison of different feature map selection strategies
As described in Section 3.2, the generator takes three different feature maps from the feature extractor as the input and maps them to the image space. Then, the output of the generator is used to calculate the adversarial loss, which is crucial for domain-invariant feature learning. Therefore, the selection strategy for the feature maps will affect the segmentation performance. Because there are four down-sampling operations in our feature extractor, we have a total of five sizes including the original image (1:512 × 512, 2:256 × 256, 3:128 × 128, 4:64 × 64, 5:32 × 32). We conduct a series of experiments using different combinations. From Table 3, it can be observed that our network achieves the highest performance when the generator takes high-level feature maps as the input. This proves that the high-level semantic information is more helpful for domain adaptation than the rich details in the low-level features maps. We adopt this setting for our network.
Table 3.
1 | 2 | 3 | 4 | 5 | Dice (%) ↑ | |||
---|---|---|---|---|---|---|---|---|
GGO | Consolidation | Infection | Lung | |||||
× | × | 83.43 ± 0.40 | 67.35 ± 0.36 | 83.30 ± 0.23 | 95.69 ± 0.07 | |||
× | × | 85.36 ± 0.28 | 74.83 ± 0.40 | 84.24 ± 0.16 | 96.11 ± 0.08 | |||
× | × | 86.31 ± 0.27 | 74.55 ± 0.30 | 86.15 ± 0.29 | 96.13 ± 0.07 |
(The highest evaluation score is marked in bold. ↑ means a higher number is better. GGO: ground-glass opacity)
Effect of different components in our network
Table 4 shows how the different components of our network influence the segmentation performance. The bold line corresponds to our proposed method, while the other methods differ from our proposed approach in the following respects. (1) w/o adversarial training: the source-only baseline corresponds to α= 0. Here, the domain adaptation module is excluded and only the feature extractor and classifier are used. (2) w/o skip connections: the skip connections between the feature extractor and classifier are removed, which are essential for preserving the fine-grained details in the segmentation. (3) feature space training: the domain adaptation module is removed and pixel-level adversarial loss for the feature maps is calculated, which is then used to update the feature extractor and classifier. The experimental results show that the domain adaptation module is critical to ensuring the excellent performance of our network. In addition, compared with feature space training, calculating the adversarial loss on image space is more efficient.
Table 4.
Network configuration | Dice (%) ↑ | |||
---|---|---|---|---|
GGO | Consolidation | Infection | Lung | |
w/o adversarial training | 79.16 ± 0.56 | 61.42 ± 0.45 | 79.98 ± 0.30 | 88.54 ± 0.32 |
w/o skip connections | 78.56 ± 0.48 | 61.71 ± 0.63 | 78.81 ± 0.24 | 94.11 ± 0.01 |
Feature space training | 81.87 ± 0.33 | 52.61 ± 0.43 | 79.52 ± 0.41 | 92.33 ± 0.22 |
Ours | 86.31 ± 0.27 | 74.55 ± 0.30 | 86.15 ± 0.29 | 96.13 ± 0.07 |
(The highest evaluation score is marked in bold. ↑ means a higher number is better. GGO: ground-glass opacity)
Conclusion
In this paper, we proposed a novel unsupervised domain adaptation based method for COVID-19 infection segmentation in CT images. We considered a challenging situation in which abundant synthetic annotated medical images are available, but no annotations are available for real COVID-19 lung CT images. We introduced unsupervised adversarial training to our network to correlate the features between real COVID-19 CT images and synthetic images. The cross-domain adversarial loss enforces the features learned by feature extractor from the two domains closer, thus the network can learn the common representations of two domains and retain the diagnostic information (i.e., the features of COVID-19 infection). Experimental results on CT images of COVID-19 cases demonstrated that our proposal outperforms baseline and state-of-the-art approaches. We also demonstrated the effectiveness of our network in lung segmentation. Our proposed method has great potential for use in diagnosing COVID-19 by quantifying the infected areas of the lung.
Biographies
Han Chen
is currently a Ph.D. candidate in the Department of Electronics and Computer Engineering, Korea University. She received the master’s degree in communication engineering from Harbin Engineering University, Harbin, China. Her research interests relate to intelligent signal process, image segmentation and object tracking.
Yifan Jiang
is currently a Ph.D. candidate in the Department of Electronics and Computer Engineering, Korea University. He received a master’s degree in electrical engineering from Korea University. His research interests relate to medical imaging and surveillance system.
Murray Loew
is Professor and Chair of Biomedical Engineering at George Washington University, Washington, D.C. USA. His teaching and research are in medical imaging, image analysis, and machine learning. He is a Fellow of IEEE, SPIE, and AIMBE and a Fulbright Distinguished Chair in Advanced Science and Technology (Australia, 2014).
Hanseok Ko
is Professor of Electrical and Computer Engineering at Korea University, Seoul, Korea, and has been serving as the Director of Intelligent Signal Processing Lab. He has been actively engaged in the research efforts developing solutions addressing the multimodal-based technology issues, including human-machine interaction problems. He is currently serving as General Chair of IEEE ICASSP 2024 and Interspeech 2022, respectively.
Footnotes
This research work is supported by the Air Force Office of Scientific Research (award number FA2386-19-1-4001)
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Han Chen, Email: [email protected].
Yifan Jiang, Email: [email protected].
Murray Loew, Email: [email protected].
Hanseok Ko, Email: [email protected].
References
- 1.Ortiz-Ospina E, Ritchie H et al (2021) Mortality risk of covid-19. https://ourworldindata.org/mortality-risk-covid
- 2.Xu B, Xing Y, Peng J, Zheng Z, Tang W, Sun Y, Xu C, Peng F (2020) Chest ct for detecting covid-19: a systematic review and meta-analysis of diagnostic accuracy. Eur Radiol 1 [DOI] [PMC free article] [PubMed]
- 3.Chung M, Bernheim A, Mei X, Zhang N, Huang M, Zeng X, Cui J, Xu W, Yang Y, ZA F, et al. Ct imaging features of 2019 novel coronavirus (2019-ncov) Radiology. 2020;295(1):202–207. doi: 10.1148/radiol.2020200230. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Gozes O, Frid-Adar M, Greenspan H, Browning PD, Zhang H, Ji W, Adam B, Eliot S (2020) Rapid ai development cycle for the coronavirus (covid-19) pandemic: Initial results for automated detection & patient monitoring using deep learning ct image analysis. arXiv:2003.05037
- 5.Shan F, Gao Y, Wang J, Shi W, Shi N, Han M, Xue Z, Shi Y (2020) Lung infection quantification of covid-19 in ct images with deep learning. arXiv:2003.04655
- 6.Shen C, Yu N, Cai S, Zhou J, Sheng J, Liu K, Zhou H, Guo Y, Niu G (2020) Quantitative computed tomography analysis for stratifying the severity of coronavirus disease 2019. J Pharmaceut Anal [DOI] [PMC free article] [PubMed]
- 7.Shi H, Han X, Jiang N, Cao Y, Alwalid O, Gu J, Fan Y, Zheng C. Radiological findings from 81 patients with covid-19 pneumonia in wuhan, china: a descriptive study. Lancet Infect Diseas. 2020;20(4):425–434. doi: 10.1016/S1473-3099(20)30086-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Shi F, Wang J, Shi J, Wu Z, Wang Q, Tang Z, He K, Shi Y, Shen D (2020) Review of artificial intelligence techniques in imaging data acquisition, segmentation and diagnosis for covid-19. IEEE Rev Biomed Eng [DOI] [PubMed]
- 9.Adler-Milstein J, Jha AK. Sharing clinical data electronically: a critical challenge for fixing the health care system. JAMA. 2012;307(16):1695–1696. doi: 10.1001/jama.2012.525. [DOI] [PubMed] [Google Scholar]
- 10.Sharma P, Shamout FE, Clifton DA (2019) Preserving patient privacy while training a predictive model of in-hospital mortality. arXiv:1912.00354
- 11.Dai C, Mo Y, Angelini E, Guo Y, Bai W (2019) Transfer learning from partial annotations for whole brain segmentation. In: Domain adaptation and representation transfer and medical image learning with less labels and imperfect data. Springer, pp 199–206
- 12.Zheng C, Deng X, Fu Q, Zhou Q, Feng J, Ma H, Liu W, Wang X (2020) Deep learning-based detection for covid-19 from chest ct using weak label medRxiv
- 13.Lu H, Han R, Ai T, Yu P, Kang H, Tao Q, Xia L. Serial quantitative chest ct assessment of covid-19 Deep-learning approach. Radiology Cardiothoracic. 2020;2(2):e200075. doi: 10.1148/ryct.2020200075. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Qi X, Jiang Z, Yu Q, Shao C, Zhang H, Yue H, Ma B, Wang Y, Liu C, Meng X et al (2020) Machine learning-based ct radiomics model for predicting hospital stay in patients with pneumonia associated with sars-cov-2 infection: A multicenter study medRxiv [DOI] [PMC free article] [PubMed]
- 15.Tongxue Z, Canu S, Ruan S (2020) An automatic covid-19 ct segmentation based on u-net with attention mechanism. arXiv:2004.06673 [DOI] [PMC free article] [PubMed]
- 16.Elharrouss O, Subramanian N, Al-Maadeed S (2020) An encoder-decoder-based method for covid-19 lung infection segmentation. arXiv:2007.00861 [DOI] [PMC free article] [PubMed]
- 17.Xu Z, Cao Y, Jin C, Shao G, Liu X, Zhou J, Shi H, Feng J (2020) Gasnet: Weakly-supervised framework for covid-19 lesion segmentation. arXiv:2010.09456
- 18.Ouyang X, Huo J, Xia L, Shan F, Liu J, Mo Z, Yan F, Ding Z, Yang Q, Song B, et al. Dual-sampling attention network for diagnosis of covid-19 from community acquired pneumonia. IEEE Trans Med Imaging. 2020;39(8):2595–2605. doi: 10.1109/TMI.2020.2995508. [DOI] [PubMed] [Google Scholar]
- 19.Oulefki A, Agaian S, Trongtirakul T, Laouar AK. Automatic covid-19 lung infected region segmentation and measurement using ct-scans images. Pattern Recognition. 2021;114:107747. doi: 10.1016/j.patcog.2020.107747. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Fan D-P, Zhou T, Ji G-P, Yi Z, Chen G, Fu H, Shen J, Shao L. Inf-net: Automatic covid-19 lung infection segmentation from ct images. IEEE Trans Med Imaging. 2020;39(8):2626–2637. doi: 10.1109/TMI.2020.2996645. [DOI] [PubMed] [Google Scholar]
- 21.Yu Q, Liu Y, Xu J (2020) Miniseg: An extremely minimum network for efficient covid-19 segmentation. arXiv:2004.09750 [DOI] [PubMed]
- 22.Laradji I, Rodriguez P, Manas O, Lensink K, Law M, Kurzman L, Parker W, Vazquez D, Nowrouzezahrai D (2020) A weakly supervised consistency-based learning method for covid-19 segmentation in ct images. arXiv:2007.02180
- 23.Liu S, Georgescu B, Xu Z, Yoo Y, Chabin G, Chaganti S, Grbic S, Piat S, Teixeira B, Balachandran A et al (2020) 3d tomographic pattern synthesis for enhancing the quantification of covid-19. arXiv:2005.01903
- 24.Jiang Y, Chen H, Loew MH, Ko H (2020) Covid-19 ct image synthesis with a conditional generative adversarial network IEEE Journal of Biomedical and Health Informatics [DOI] [PMC free article] [PubMed]
- 25.Li H, Hu Y, Li S, Lin W, Liu P, Higashita R, Liu J (2020) Ct scan synthesis for promoting computer-aided diagnosis capacity of covid-19. In: International conference on intelligent computing. Springer, pp 413–422
- 26.Jiang Y, Chen H, Ko H, Han DK (2021) Few-shot learning for ct scan based covid-19 diagnosis. In: ICASSP 2021 - 2021 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 1045–1049
- 27.Jose A, Ravi S, Sambath M (2014) Brain tumor segmentation using k-means clustering and fuzzy c-means algorithms and its area calculation. Int J Innovat Res Comput Commun Eng 2(2)
- 28.Ouyang C, Biffi C, Chen C, Kart T, Qiu H, Rueckert D (2020) Self-supervision with superpixels: Training few-shot medical image segmentation without annotation. In: European conference on computer vision. Springer, pp 762–780
- 29.Shan S, Yan W, Guo X, Chang EI, Fan Y, Xu Y et al (2017) Unsupervised end-to-end learning for deformable medical image registration. arXiv:1711.08608
- 30.De Vos BD, Berendsen FF, Viergever MA, Sokooti H, Staring M, Išgum I. A deep learning framework for unsupervised affine and deformable image registration. Medical Image Analysis. 2019;52:128–143. doi: 10.1016/j.media.2018.11.010. [DOI] [PubMed] [Google Scholar]
- 31.Xu Z, Niethammer M (2019) Deepatlas: Joint semi-supervised learning of image registration and segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 420–429
- 32.Wang G, Liu X, Li C, Xu Z, Ruan J, Zhu H, Meng T, Li K, Huang N, Zhang S. A noise-robust framework for automatic segmentation of covid-19 pneumonia lesions from ct images. IEEE Trans Med Imaging. 2020;39(8):2653–2663. doi: 10.1109/TMI.2020.3000314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Patel VM, Gopalan R, Li R, Chellappa R. Visual domain adaptation: a survey of recent advances. IEEE Signal Processing Magazine. 2015;32(3):53–69. doi: 10.1109/MSP.2014.2347059. [DOI] [Google Scholar]
- 34.Wang M, Deng W. Deep visual domain adaptation: A survey. Neurocomputing. 2018;312:135–153. doi: 10.1016/j.neucom.2018.05.083. [DOI] [Google Scholar]
- 35.Lee S, Park E, Yi H, Lee SH (2020) Strdan: Synthetic-to-real domain adaptation network for vehicle re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 608–609
- 36.Kim M, Byun H (2020) Learning texture invariant representation for domain adaptation of semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12975–12984
- 37.Shao Y, Li L, Ren W, Gao C, Sang N (2020) Domain adaptation for image dehazing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2808–2817
- 38.Yang C, Lim S-N (2020) One-shot domain adaptation for face generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5921–5930
- 39.Hsu H-K, Yao C-H, Tsai Y-H, Hung W-C, Tseng H-Y, Singh M, Yang M-H (2020) Progressive domain adaptation for object detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 749–757
- 40.Gretton A, Borgwardt KM, Rasch MJ, Schölkopf B, Smola A. A kernel two-sample test. J Mach Learn Res. 2012;13(1):723–773. [Google Scholar]
- 41.Ghifary M, Bastiaan Kleijn W, Zhang M (2014) Domain adaptive neural networks for object recognition. In: Pacific rim international conference on artificial intelligence. Springer, pp 898–904
- 42.Yan H, Ding Y, Li P, Wang Q, Xu Y, Zuo W (2017) Mind the class weight bias Weighted maximum mean discrepancy for unsupervised domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2272–2281
- 43.Haeusser P, Frerix T, Mordvintsev A, Cremers D (2017) Associative domain adaptation. In: Proceedings of the IEEE international conference on computer vision, pp 2765–2773
- 44.Li H, Caragea D, Caragea C, Herndon N. Disaster response aided by tweet classification with a domain adaptation approach. Journal of Contingencies and Crisis Management. 2018;26(1):16–27. doi: 10.1111/1468-5973.12194. [DOI] [Google Scholar]
- 45.Zou Y, Yu Z, Vijaya Kumar BVK, Wang J (2018) Unsupervised domain adaptation for semantic segmentation via class-balanced self-training. In: Proceedings of the european conference on computer vision (ECCV), pp 289–305
- 46.Spadotto T, Toldo M, Michieli U, Zanuttigh P (2021) Unsupervised domain adaptation with multiple domain discriminators and adaptive self-training. In: 2020 25Th international conference on pattern recognition (ICPR). IEEE, pp 2845–2852
- 47.Vu T-H, Jain H, Bucher M, Cord M, Pérez P. (2019) Advent: Adversarial entropy minimization for domain adaptation in semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2517–2526
- 48.Tzeng E, Hoffman J, Saenko K, Darrell T (2017) Adversarial discriminative domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7167–7176
- 49.Pan F, Shin I, Rameau F, Lee S, In SK (2020) Unsupervised intra-domain adaptation for semantic segmentation through self-supervision. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3764–3773
- 50.Park J, Han DK, Ko H. Fusion of heterogeneous adversarial networks for single image dehazing. IEEE Trans Image Process. 2020;29:4721–4732. doi: 10.1109/TIP.2020.2975986. [DOI] [PubMed] [Google Scholar]
- 51.Perone CS, Ballester P, Barros RC, Cohen-Adad J. Unsupervised domain adaptation for medical imaging segmentation with self-ensembling. NeuroImage. 2019;194:1–11. doi: 10.1016/j.neuroimage.2019.03.026. [DOI] [PubMed] [Google Scholar]
- 52.Degel MA, Navab N, Albarqouni S (2018) Domain and geometry agnostic cnns for left atrium segmentation in 3d ultrasound. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 630–637
- 53.Zhang Y, Miao S, Mansi T, Liao R (2018) Task driven generative modeling for unsupervised domain adaptation Application to x-ray image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 599–607
- 54.Ren J, Hacihaliloglu I, Singer EA, Foran DJ, Qi X (2018) Adversarial domain adaptation for classification of prostate histopathology whole-slide images. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 201–209 [DOI] [PMC free article] [PubMed]
- 55.Kamnitsas K, Baumgartner C, Ledig C, Newcombe V, Simpson J, Kane A, Menon D, Nori A, Criminisi A, Rueckert D et al (2017) Unsupervised domain adaptation in brain lesion segmentation with adversarial networks. In: International conference on information processing in medical imaging. Springer, pp 597–609
- 56.Zhu J-Y, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 2223–2232
- 57.He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
- 58.Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch. In: NIPS workshop
- 59.Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980
- 60.Fenster A, Chiu B (2006) Evaluation of segmentation algorithms for medical imaging. In: 2005 IEEE Engineering in medicine and biology 27th annual conference. IEEE, pp 7186–7189 [DOI] [PubMed]
- 61.Milletari F, Navab N, Ahmadi S-A (2016) V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth international conference on 3d vision (3DV). IEEE, pp 565–571