Abstract
Attenuation correction (AC) is important for an accurate interpretation and quantitative analysis of SPECT myocardial perfusion imaging. Dedicated cardiac SPECT systems have invaluable efficacy in the evaluation and risk stratification of patients with known or suspected cardiovascular disease. However, most dedicated cardiac SPECT systems are standalone, not combined with a transmission imaging capability such as computed tomography (CT) for generating attenuation maps for AC. To address this problem, we propose to apply a conditional generative adversarial network (cGAN) for generating attenuation-corrected SPECT images (SPECTGAN) directly from non-corrected SPECT images (SPECTNC) in image domain as a one-step process without requiring additional intermediate step. The proposed network was trained and tested for 100 cardiac SPECT/CT data from a GE Discovery NM 570c SPECT/CT, collected retrospectively at Yale New Haven Hospital.The generated images were evaluated quantitatively through the normalized root mean square error (NRMSE), peak signal to noise ratio (PSNR), and structural similarity index (SSIM) and statistically through joint histogram and error maps. In comparison to the reference CT-based correction (SPECTCTAC), NRMSEs were 0.2258±0.0777 and 0.1410±0.0768 (37.5% reduction of errors); PSNRs 31.7712±2.9965 and 36.3823±3.7424 (14.5% improvement in signal to noise ratio); SSIMs 0.9877±0.0075 and 0.9949±0.0043 (0.7% improvement in structural similarity) for SPECTNC and SPECTGAN, respectively. This work demonstrates that the conditional adversarial training can achieve accurate CT-less attenuation correction for SPECT MPI, that is quantitatively comparable to CTAC. Standalone dedicated cardiac SPECT scanners can benefit from the proposed GAN to reduce attenuation artifacts efficiently.
Keywords: SPECT, myocardial perfusion imaging (MPI), attenuation correction, deep learning, generative adversarial network
1. INTRODUCTION
Single-photon emission computed tomography (SPECT) is a well-validated noninvasive imaging technique which enables studying the function of underlying organs or tissues after the injection of radioactive tracers. SPECT gamma camera by capturing the emitted gamma-ray photons of the injected radioactive tracer can clearly reflect the distribution of the tracer throughout patient’s whole-body. SPECT imaging has a wide spectrum of clinical applications. One of the most common SPECT clinical applications is myocardial perfusion imaging (MPI) for clinical diagnosis and risk stratification of cardiovascular disease.1
Photon attenuation from photoelectric absorption and Compton scattering are the major factors in reducing quantitative accuracy and quality of SPECT images. Diaphragm is the primary reason for attenuation artifacts in MPIs which results in perfusion defects in inferior wall. Breast attenuation in women can also produces artifacts along the anterior wall of left ventricle (LV). Several studies have demonstrated that attenuation correction can improve both the sensitivity and specificity for detection of coronary artery disease (CAD) and generate a relative uniform tracer distribution for patients with a low likelihood of CAD.2–4 Moreover, attenuation corrected images can help physicians to have a more accurate diagnosis which in result can reduce the unnecessary invasive angiography procedures.5,6Therefore, correcting the tissue-dependent attenuation is essential for accurate quantitative analysis. Without accurate attenuation correction, severe artifacts may occur, which may prevent accurate clinical interpretation.
Integrated SPEC/CT systems were first introduced by Lang and Hasagawa et al.7 The system by performing additional anatomical imaging on patients can fuse physiologic and anatomic images in a registered format which facilitates visualization of anatomic localization of tracer distribution in the body. Furthermore, integrated SPECT/CT systems facilitated attenuation correction using a CT for generating attenuation maps (μ-map), that is more efficient and practical than using radionuclide transmission sources only for attenuation correction. However, these integrated systems are still more expensive than standalone SPECT systems, that currently occupy the majority of cardiac SPECT market share,8 and increase the patient’s radiation dose due to performing an extra CT scan only for AC in SPECT. Besides, respiratory and cardiac motion may cause incorrect attenuation correction artifacts due to a mismatch between SPECT and CT data. Therefore, it is important to be able to correct attenuation in standalone SPECT scanners with no extra transmission source or CT available and saving unnecessary extra CT radiation dose in integrated systems.
To address the problem of μ-maps estimation from only SPECT emission data, different algorithms were proposed which can be categorized into two categories: (1) segmentation-based algorithm, and (2) model-based algorithm. In segmentation-based algorithm, estimating μ-map without transmission imaging like a CT can be achieved by utilizing photopeak and scatter data information. In this approach, μ-maps are estimated by assigning predefined attenuation coefficient to different manually segmented regions in SPECT images.9–11 The main assumption here is that tissues have uniform attenuation coefficient which is not a correct assumption and cannot be extended to all the patients. Moreover, manual segmentation of different regions makes the segmentation-based approaches operator-dependent, time consuming, and not suitable for clinical workflow. The Model based algorithms also estimate the attenuation coefficients directly from SPECT emission data.12–14 Model-based approaches do not incorporate the scattered photon in their models, and this causes an inaccurate estimation. In addition, these methods suffer from computational time and can only be applied to slices of the images and not the whole 3D structure.
In recent years, deep learning has demonstrated tremendous success in different computer vision tasks. Duo to deep learning success and its proven potential in computer vision, there has been an increased interest recently in investigating how deep learning algorithms can be applied to medical images and provide insightful information to physicians in their diagnosis and treatment procedure. Attenuation correction is also among one of the problems which have been studied using deep learning for both SPECT and PET data.15–18 Convolutional Neural Networks (CNNs) can be used to generate synthetic CT images from magnetic resonance imaging (MRI) in order to correct attenuation for PET in PET/MRI.18,19 However, these methods still require anatomical images and our focus in this study is to develop an end-to-end approach for AC with no transmission source available.
In this study, we developed a 3D conditional generative adversarial network (cGAN) to correct the attenuation in SPECT MPI images directly from non-corrected SPECT images. We demonstrated the feasibility of attenuation correction using a cGAN to correct attenuation in image domain using only non-corrected SPECT images without using CT data or generating pseudo-CT data as an intermediate step. The proposed approach is an end-to-end process that can model non-linear mapping directly from non-corrected to attenuation-corrected images, whereas conventional methods rely on CT values that needs to be generated first and then be used for attenuation correction.
2. MATERIALS AND METHODS
2.1. Data Acquisition
In an IRB-approved study, 100 patients were scanned on GE Discovery NM/CT 570c SPECT/CT scanner (GE D570c) at Yale New Haven Hospital for stress-only myocardial perfusion SPECT studies using 99mTc-tetrofosmin. The datasets consist of 42 female subjects and 58 male subjects. Other characteristics of subjects are not available duo to data anonymization. CT data was acquired for each subject with the parameters of 120 kVp, 50 mA, and rotation time of 0.4 second. Corresponding attenuation corrected images for each non-corrected image were generated by rigid alignment of CT and SPECTNC using Attenuation Correction Quality Control package (GE ACQC). Both SPECTNC and SPECTCTAC were reconstructed using one-step-late algorithm with Green prior. SPECTCTAC images were reconstructed with 60 iterations and post-filtered by a Butterworth filter with a cutoff of 0.37 cm-1 and an order of 7; and SPECTNC images were reconstructed with 30 iterations and post-filtered by a Butterworth filter with a cutoff of 0.4 cm-1 and an order of 10. All reconstruction parameters are clinically used at Yale New Haven Hospital.
The SPECTCTAC was used as a ground truth to compare the proposed deep learning method to the standard CT-based correction approach. The original size of reconstructed images was 70 × 70 × 50 with an isotropical voxel size of 4 mm. As a preprocessing step, the size of reconstructed images was changed to 64 × 64 × 32 to enable upsampling and downsampling by removing edges from the images. Voxel values were normalized to a scale of 0 to 1 by the maximum value of volumetric images.
2.2. Network Architecture and Training
Generative adversarial network (GAN) is a generative model which was first introduced by Ian Goodfellow in 2014.20The framework includes two neural network models, a generator (G) and a discriminator (D) which compete with each other in a zero-sum game manner. Adversarial way of training led to more stable models compared to non-adversarial training and was formulated as a new way of deep convolutional learning method in 2015.21 GANs can generate synthetic plausible images with data distribution of a given dataset from the latent space. The objective function of the original GAN can be expressed as follows.
(1) |
Where Pdata is the real data distribution, z is a random input drawn from a specific probability distribution (such as Gaussian) z ~ Pnoise(z).
Conditional generative adversarial networks can be used for an image-to-image translation task by conditioning a model on source images and learning a map between the source and target images which in our case are non-corrected and attenuation corrected images, respectively.22
Given the non-attenuation corrected SPECT image x ~ PNC(x), and the corresponding CT-based attenuation corrected image y ~ PCTAC(y), the objective function of a conditional GAN can be formulated as below.
(2) |
The last term is an additional estimation of error loss which we added to the main objective function in order to make sure that the generated images are similar to the real attenuation corrected images. In this case the generator uses both the discriminator’s feedback and the generator’s loss information to deceive the discriminator while trying to generate real looking images. λ is a hyper parameter for tuning the weight between the original objective function and the added error loss.
The conditional GAN network architecture in this work was inspired by the Pix2Pix network.22 We used the Pix2Pix GAN framework as a baseline and modified the architecture explained above for our application. The discriminator in this model is a deep convolutional neural network which performs the conditional-image classification task. The generator task is to generate real-looking images of the target domain given the source images as its input and via adversarial training. The schematic conditional GAN architecture is shown in Fig. 1.
Below we explain the generator and discriminator architecture used in the proposed cGAN in more details.
2.2.1. Generator
The modified generator’s architecture has an encoder-decoder structure with symmetrical skip connections between different stages adopted from the original U-Net model.23 In the U-Net, the input progressively downsample through the encoder’s layers and the process is reversed through the layers of the decoder. A U-Net can extract important features at different resolutions through its encode-decoder structure. The U-Net model was first introduced for segmentation task and later was modified for image synthetic task. The original U-Net has a 2D architecture, while in this work 3D architecture was used. The 3D generator model takes 3D volume (64 × 64 × 32) as its input and uses 3D operations throughout the architecture.
Residual blocks were embedded between the last layer of the encoder and first layer of the decoder to keep the important characteristics of the original input images. Each residual block is a neural network which consists of two convolution layers along with a normalization and rectified linear unit (ReLU) activation function. Combining U-Net architecture with residual blocks can facilitate smoother flow of information between the input and output of the network. Skip connections can help regain some of the lost gradient information by combining hierarchical features. Moreover, it can also provide a better model’s convergence and performance in image translation applications.22
The proposed generator’s architecture consists of 4 contracting (encoder) paths and 4 expansive paths (decoder) with symmetrical skip connections. We used convolution operator (Conv) with kernel size of 3, instance normalization (IN), and leaky ReLU (LReLU) as an activation function in each step of the encoder. Same was applied in the decoder except that we used ReLU instead of LReLU in the decoder part (Fig. 2).
2.2.2. Discriminator
The discriminator in this model is a deep convolutional neural network which performs a conditional-image classification task. The discriminator takes either a real AC SPECT image or a synthesized one (generated by the generator) as input and determine whether the input to its network is real or fake.
The discriminator in our model consists of 6 convolutional layers with kernel size 3 and with strides 2. The first convolution layer produces 16 feature maps, and then this number is doubled at each following convolution step. All convolutional layers are followed by normalization and Leaky ReLU activation function with slope of 0.2. A filter size of 1 is applied to the last convolutional layer to determine whether the input was real or fake (Fig. 3).
2.2.3. Training with CGAN
Following the same standard adversarial training approach in the original GAN,20 the generator and discriminator were trained in an alternative manner. The discriminator was trained first and then the generator using the Adam optimization with an initial learning rate of 0.0002. The initial learning rate changed to 0 after the first 100 epochs. 200 epochs were used for the entire training and setting the estimation error weight (λ) to 0.005 results in better evaluation performance. 5-fold cross-validation method was used for training and testing the proposed approach.
2.2.4. Training with WGAN
Although GAN has been successful in many generative tasks, training a GAN model is challenging because stable training requires finding and maintaining an equilibrium between the capabilities of the generator and discriminator. The Wasserstein GAN24 was proposed to enable more stable and less sensitive training to the architecture of a model. Mathematically, Wasserstein (Earth Mover (EM)) distance which measure the similarity between two probability distributions is continuous and differentiable almost everywhere and this feature makes it to be a much more sensible cost function than other commonly used cost functions such as cross-entropy used in training a network.
The problem with using cross-entropy loss function is that the discriminator learns very quickly to distinguish between fake and real images and after a while provides no reliable gradient information that can help a generator to improve the quality of generated images (vanishing gradient problem). One of the reasons that a discriminator in a GAN learns to separate real and fake samples perfectly is that both real and generated datasets rest in low dimensional manifolds due to restrictions on image characteristics that the model is trained on. When a low dimensional manifold (image hyperplanes) is embedded in high dimensional space (image dimension), the image hyper-planes hardly overlap and allows the discriminator to easily separate the real and fake images. WGAN replaces the discriminator model with a critic model which instead of binary classification, scores the realness of a given data. The critic function does not have the saturation problem and converges to a linear function that provides clean gradients everywhere. Therefore, a new Wasserstein-based training model has the capacity to improve the stability of the optimization process.
2.3. Evaluation metrics
To evaluate the reliability of the proposed method on correcting the attenuation, we calculated normalized mean square error (NRMSE), peak signal to noise ratio (PSNR), and structural similarity index (SSIM) for non-corrected SPECT (SPECTNC) and cGAN-corrected SPECT (SPECTGAN), compared to the reference CT-based attenuation corrected images (SPECTCTAC). Lower NRMSE and higher SSIM and PSNR indicate better similarity. The evaluation measurements are defined as below.
(3) |
(4) |
(5) |
where V is image volume. I is either SPECTNC or SPECTDL, and Iref is SPECTCTAC. μ and σ denote the mean and variance of the image I. c1 and c2 are variables for stabilizing the division with a weak denominator.
For statistical analysis, joint histograms were used to show the statistical distribution of voxel-by-voxel correlation with the reference for SPECTNC and SPECTDL, respectively.
3. RESULTS
The results of the quantitative measures using 5-fold cross-validation can be found in Tab. 1. The numerical results in Tab. 1 demonstrate that the proposed DL approach improves the similarity of SPECTDL to the reference SPECTCTAC by 37.5% for NRMSE and 14.5% for PSNR, compared to that of SPECTNC to the reference. The joint histograms in Fig. 4 demonstrate how the proposed approach can improve the similarity of SPECTNC to the reference SPECTCTAC, which is consistent with the results in Tab. 1, by reducing the bandwidth of the scattered correlation of SPECTNC with the reference.
Table 1.
NRMSE | PSNR | SSIM | ||
---|---|---|---|---|
SPECTNC | 0.2258 ± 0.0777 | 31.7712 ± 2.9965 | 0.9877 ± 0.0075 | |
SPECTCGAN | 0.1410 ± 0.0768 | 36.3823 ± 3.7424 | 0.9949 ± 0.0043 | |
SPECTUnbalCGAN | 0.1468 ± 0.0873 | 36.1304 ± 3.8735. | 0.9947 ± 0.0050 | |
SPECTWGAN | 0.1355 ± 0.0571 | 36.4537 ± 3.4182 | 0.9949 ± 0.0042 |
For qualitative analysis, SPECTCGAN and SPECTNC images were visually compared to the reference SPECTCTAC images using CGAN training. Fig. 5 shows a sample image with the minimum NRMSE value (0.06116) in the transverse, coronal, and sagittal view. This case has the highest value for both PSNR (42.7727) and SSIM (0.99921). Places where the attenuation was improved using the cGAN based approach have been marked by arrows.
The conditional GAN performance depends heavily on the tuning of the parameters to slow down the rate at which discriminator learns relative to the generator’s model. Using Wasserstein loss can make the training process more stable and less sensitive to the choice of hyperparameter configurations. For the results in Tab. 1, we need to tune the parameters in a way to have a balanced model which as a result provides a better performance. As can be seen in Tab. 1 conditional GAN using Wasserstein loss has better quantitative results compared to the unbalanced cGAN. The results of the Wasserstein cGAN is also slightly better than the cGAN itself and there is no need to balance learning capability between the generator and the discriminator. Parameters such as loss weight, learning rate, and momentum parameters in the chosen optimizer can affect the relative rate of learning and need to be tuned. Therefore, the model trained with Wasserstein loss is more stable and has a slightly better quantitative results compared to the other models trained with other loss functions. However, what we found is that with enough hyperparameter optimization to find a balance between the generator and the discriminator, cGAN and WGAN provide comparable results.
4. CONCLUSION AND FUTURE DIRECTIONS
SPECT MPI is susceptible to attenuation artifacts; and thus an efficient AC technique is desired for improving diagnostic accuracy. Specifically, the technique will benefit patients scanned in stand-alone dedicated cardiac SPECT systems. In this work, we proposed a direct DL-based AC technique using a cGAN in SPECT MPI. The preliminary results demonstrate the feasibility of producing accurate attenuation corrected images directly from non-corrected data in an end-to-end manner using adversarial training. We further extended the proposed model to Wasserstein cGAN and compared the results of WGAN with cGAN from the stability point of view. The proposed approach has the potential to promote the clinical feasibility of offering a practical means of attenuation correction for dedicated myocardial SPECT systems which requires a practical way of attenuation correction without generating attenuation maps as an intermediate step. Future work will include testing our algorithms using a larger number of clinical datasets for robust training with diverse cases annotated by expert human readers as well as extending the proposed cGAN model to a cycleGAN model.25
5. ACKNOWLEDGEMENT
The study was supported by the National Institutes of Health under Grants R01HL135490 and R01EB026331, R01HL123949, and American Heart Association award 18PRE33990138.
REFERENCES
- [1].Arman R and Zaidi H, “Pet versus spect: strengths, limitations and challenges,” Nuclear medicine communications 29, 193–207 (2008). [DOI] [PubMed] [Google Scholar]
- [2].EP F, JA F, PD S, JN K, PA R, and JR C, “Simultaneous transmission/emission myocardial perfusion tomography. diagnostic accuracy of attenuation-corrected 99mtc-sestamibi single-photon emission computed tomography,” Circulation 93, 463–73 (1996). [DOI] [PubMed] [Google Scholar]
- [3].HJ G, J S, P M, and et al. , “Attenuation-corrected thallium-201 single-photon emission tomography using a gadolinium-153 moving line source: clinical value and the impact of attenuation correction on the extent and severity of perfusion abnormalities,” Eur J Nucl Med 25, 220–8 (1998). [DOI] [PubMed] [Google Scholar]
- [4].Bhupinder S, Bateman TM, Case JA, , and Heller G, “Attenuation artifact, attenuation correction, and the future of myocardial perfusion spect,” Journal of Nuclear Cardiology 14, 153 (2007). [DOI] [PubMed] [Google Scholar]
- [5].van Dijk J, M M, J O, van Dalen J, S K, C S, and et al. , “Pet versus spect: strengths, limitations and challenges,” J Nucl Cardiol 24, 395–401 (2017).26780528 [Google Scholar]
- [6].ND P, S P, A S, and EJ M, “Does improved technology in spect myocardial perfusion imaging reduce downstream costs? an observational study,” International Journal of Radiology and Imaging Technology (2017). [Google Scholar]
- [7].TF L, BH H, SC L, and et al. , “Description of a prototype emission-transmission computed-tomography imaging system,” J Nucl Med. 33, 1881–1887 (1992). [PubMed] [Google Scholar]
- [8].“Global spect market 2017-2021,” https://www.technavio.com/report/global-medical-imaging-global-spect-market-2017-2021 . [Google Scholar]
- [9].H Z and B. H, “Determination of the attenuation map in emission tomography,” J Nucl Med. 44, 291–315 (2003). [PubMed] [Google Scholar]
- [10].M N, V P, R V, F M, O A, and BF H, “Attenuation correction for lung spect: evidence of need and validation of an attenuation map derived from the emission data,” Eur J Nucl Med Mol Imaging 36, 1076–89 (2009). [DOI] [PubMed] [Google Scholar]
- [11].T-S P, MA K, de Vries DJ, and M. L, “Segmentation of the body and lungs from compton scatter and photopeak window data in spect: a monte-carlo investigation,” IEEE Trans Med Imaging 15, 13–24 (1996). [DOI] [PubMed] [Google Scholar]
- [12].AK J, Y Z, E C, MA K, and EC F, “Fisher information analysis of list-mode spect emission data for joint estimation of activity and attenuation distribution,” arXiv (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].SC C, S A, MJ E, and BF. H, “Use of measured scatter data for the attenuation correction of single photon emission tomography without transmission scanning,” Med Phys. 40 (2013). [DOI] [PubMed] [Google Scholar]
- [14].D G, D N, P G, A C, and J-P E, “Attenuation correction using spect emission data only,” IEEE Trans Nucl Sci 49, 2172–9 (2002). [Google Scholar]
- [15].J Y, D P, GT G, and Y S, “Joint correction of attenuation and scatter in image space using deep convolutional neural networks for dedicated brain 18f-fdg pet,” Phys Med Biol 64 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].L S, JA O, H L, YH L, and C L, “Deep learning-based attenuation map generation for myocardial perfusion spect,” Eur J Nucl Med Mol Imaging (2020). [DOI] [PubMed] [Google Scholar]
- [17].X D, Y L, and and et al. , W. T, “Deep learning-based attenuation correction in the absence of structural information for whole-body positron emission tomography imaging,” Phys Med Biol 65 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].D H, SK K, KY K, S S, JC P, DS L, and et al. , “Generation of pet attenuation map for whole-body time-of-flight 18f-fdg pet/mri using a deep neural network trained with simultaneously reconstructed activity and attenuation,” Nucl Med. (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].X. H, “Mr-based synthetic ct generation using a deep convolutional neural network method,” Med Phys. 44, 1408–19 (2017). [DOI] [PubMed] [Google Scholar]
- [20].Ian G, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, and Bengio Y, “Generative adversarial nets,” Advances in neural information processing systems , 2672–2680 (2014). [Google Scholar]
- [21].Alec R, Metz L, and Chintala S, “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv (2015). [Google Scholar]
- [22].Phillip I, Zhu J-Y, Zhou T, and Efros AA, “Image-to-image translation with conditional adversarial networks,” In Proceedings of the IEEE conference on computer vision and pattern recognition , 1125–1134 (2017). [Google Scholar]
- [23].O. R, P. F, and Brox T, “U-net: Convolutional networks for biomedical image segmentation,” In International Conference on Medical image computing and computer-assisted intervention, 234–241 (2015). [Google Scholar]
- [24].Martin A, Chintala S, and Bottou L, “Wasserstein gan,” arXiv preprint arXiv (2017). [Google Scholar]
- [25].J.Y. Z, T. P, Isola P, and A.A. E, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” IEEE international conference on computer vision , 2223–2232 (2017). [Google Scholar]