Latent Diffusion Model for Medical Image Standardization and Enhancement (2025)

Md Selim, Jie Zhang, Faraneh Fathi, Michael A. Brooks, Ge Wang, Guoqiang Yu, and Jin ChenThis research is supported by NIH (grant no. R21 CA231911, R01 EB028792, R01 HD101508, R41 NS122722, R42 MH135825) and Kentucky Lung Cancer Research (grant no. KLCR-3048113817). Md Selim is an ORISE Fellow with the US Food and Drug Administration. This work is conducted during his graduate student status at the Department of Computer Science and the Institute for Biomedical Informatics, Lexington, KY 40503 USA (e-mail: md.selim@uky.edu). Jie Zhang is with the Department of Radiology, University of Kentucky, Lexington, KY, USA (e-mail: jnzh222@uky.edu).Faraneh Fathi is with the Department of Biomedical Engineering, University of Kentucky, Lexington, KY, USA (e-mail: faraneh.fathi@uky.edu).Michael A. Brooks is with the the Department of Radiology, University of Kentucky, Lexington, KY, USA (e-mail: mabroo3@uky.edu).Ge Wang is with the Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY, USA (e-mail: wangg6@rpi.edu).Guoqiang Yu is with Department of Biomedical Engineering, University of Kentucky, Lexington, KY, USA (e-mail: gyu2@uky.edu).Jin Chen is with the Department of Medicine and Informatics Institute, University of Alabama at Birmingham, AL, USA (e-mail: jchen5@uab.edu).

Abstract

Computed tomography (CT) serves as an effective tool for lung cancer screening, diagnosis, treatment, and prognosis, providing a rich source of features to quantify temporal and spatial tumor changes.Nonetheless, the diversity of CT scanners and customized acquisition protocols can introduce significant inconsistencies in texture features, even when assessing the same patient. This variability poses a fundamental challenge for subsequent research that relies on consistent image features.Existing CT image standardization models predominantly utilize GAN-based supervised or semi-supervised learning, but their performance remains limited. We present DiffusionCT, an innovative score-based DDPM model that operates in the latent space to transform disparate non-standard distributions into a standardized form.The architecture comprises a U-Net-based encoder-decoder, augmented by a DDPM model integrated at the bottleneck position. First, the encoder-decoder is trained independently, without embedding DDPM, to capture the latent representation of the input data. Second, the latent DDPM model is trained while keeping the encoder-decoder parameters fixed. Finally, the decoder uses the transformed latent representation to generate a standardized CT image, providing a more consistent basis for downstream analysis.Empirical tests on patient CT images indicate notable improvements in image standardization using DiffusionCT. Additionally, the model significantly reduces image noise in SPAD images, further validating the effectiveness of DiffusionCT for advanced imaging tasks.

Index Terms:

CT imaging, image standardization, image synthesis, diffusion

I Introduction

Lung cancer is the leading cause of cancer death and is among the most prevalent types of cancer for both men and women in the United States[1]. The overall 5-year survival rate for non-small cell lung cancer (NSCLC) is approximately 19%. Computed tomography (CT) imaging plays a critical role in the early diagnosis of lung cancer and aids in defining tumor characteristics for better treatment outcomes[2, 3]. Texture features extracted from CT images may quantify spatial and temporal variations in tumor architecture and function, allowing for the determination of intra-tumor evolution[4, 5]. However, the use of CT scanners from different vendors, each with its own customized acquisition protocols, introduces significant variability in the texture features of images, even when observing the same patient. This inconsistency presents a substantial challenge for conducting large-scale studies across multiple sites[6]. The absence of standardized radiomics consequently hampers the reliability and effectiveness of downstream clinical tasks.

Inconsistency in radiomic features, including texture, shape, and intensity, is a known issue when images are captured using different scanners from various vendors or even with different acquisition protocols on the same scanner[7, 8]. This inconsistency, both within a single scanner using various settings and across different scanners using similar settings, presents a persistent challenge that needs to be addressed.Figure1 shows an example of the impact of non-standard CT imaging acquisition protocols on radiomic features. A lungman chest phantom, equipped with three artificial tumors, was scanned using Siemens CT scanners. The resulting images were reconstructed using two different Siemens reconstruction kernels Bl64 and Br40. The visual characteristics and radiomic features of the tumors varied notably in images generated with different reconstruction kernels.

Developing a universal CT image acquisition standard has been suggested as a potential solution. However, implementing this standard would require substantial modifications to existing CT imaging protocols, and could potentially narrow the scope of applications for the modality[9, 10]. Given these constraints, alternative approaches are needed to address the issue of radiomic feature discrepancies in CT images.

Recent advancement has been made to address the CT radiomic feature variability problem. One promising solution is to develop a post-processing framework capable of standardizing and normalizing existing CT images while preserving anatomic details[11, 12, 13, 14, 15]. Our research indicates that this approach allows for the extraction of reliable and consistent features from standardized images, facilitating accurate downstream analysis, and ultimately leading to improved diagnosis, treatment, and prognosis of lung cancer. Deep learning algorithms for image standardization are particularly promising for harmonizing CT images taken with diverse parameters on the same scanner[13]. It is, nevertheless, important to recognize that current solutions exhibit limitations, particularly in image texture synthesis and maintaining structural integrity. All these can adversely affect the performance of subsequent analyses, thereby impeding the development of dependable and consistent features that are crucial for enhancing lung cancer diagnosis, treatment, and prognosis.Continued research is crucial for advancing algorithms to address these challenges and augment the performance of CT image standardization. Progress in this domain has the potential to substantially improve the quality of medical imaging, contributing to the development of more effective strategies for combating lung cancer.

Latent Diffusion Model for Medical Image Standardization and Enhancement (1)
Latent Diffusion Model for Medical Image Standardization and Enhancement (2)

Compared to the state-of-the-art generative adversarial networks (GAN) and variational auto-encoders (VAE) algorithms, score-based denoising diffusion probabilistic models (DDPM)[16] shows superior performance in image standardization. DDPM learns a Markov chain to gradually convert a simple distribution, such as isotropic Gaussian, into a target data distribution. It consists of two processes: (1) a fixed forward diffusion process that gradually adds noise to an image when sequentially sampling latent variables of the same dimensionality and (2) a learned reverse denoising diffusion process, where a neural network (such as U-Net) is trained to gradually denoise an image starting from a pure noise realization. DDPM and its variants have attracted a surge of attention since 2020, resulting in key advances in continuous data modeling, such as image generation[16], super-resolution[17], and image-to-image translation[18]. More recently, conditional DDPM has shown remarkable performance in conditional image generation[19]. In parallel, latent DDPM enables generating image embedding in a low-dimensional latent space.

Building on recent advancements in DDPM, this study introduces DiffusionCT, an innovative solution for CT image standardization. The architecture of DiffusionCT combines an encoder-decoder network with a latent conditional DDPM, as illustrated in Fig2. The encoder-decoder network maps the input CT image to a low-dimensional latent representation. The DDPM then models the conditional probability distribution of the latent representation to synthesize a standard image. This innovative framework aims to address the current limitations in CT image standardization and contribute to more reliable medical imaging for lung cancer management. Notably, DiffusionCT preserves the original structure of the CT image while effectively standardizing texture.

Additionally, we demonstrate that the capabilities of DiffusionCT can be extended beyond standardization to include effective noise reduction in medical images. As part of our case study, we applied DiffusionCT to the 2D mapping of cerebral blood flow (CBF) images at different depths of the head captured with the time-resolved laser speckle contrast imaging (TR-LSCI) technology. DiffusionCT successfully denoised the blurry depth images, thereby recovering high-quality CBF maps. This extended capability broadens the tool’s applicability across diverse medical imaging tasks and further solidifies its potential in enhancing diagnostic and treatment strategies across a range of medical conditions.

II Background

II-A CT Image Acquisition and Reconstruction Parameters

CT images are typically acquired by setting several parameters, such as kilovoltage peak (kVp), Pitch, milliamperes-second (mAs), reconstruction field Of view (FOV), slice thickness, reconstruction kernels, etc.Varying the settings of CT image acquisition and reconstruction parameters and the selection of different CT scanners may subsequently alter radiomic features extracted from the images. For instance, in Figure1, the Br40 kernel produces a smoother image, while the Bl64 kernel results in a sharper image. These differing texture patterns will yield distinct radiomic features, complicating subsequent clinical tasks.

II-B Radiomic Features

Radiology employs sophisticated non-invasive imaging technologies for the diagnosis and treatment of various diseases. Crucial to tumor characterization are the image features extracted from radiological images using mathematical and statistical models[20].Among these features, radiomic features provide insight into the cellular and genetic levels of phenotypic patterns hidden from the naked eyes[21, 22, 20].Radiomic features can be categorized into six classes: Gradient Oriented Histogram (GOH), Gray Level Co-occurrence Matrix (GLCM), Gray Level Run Length Matrix (GLRLM), Intensity Direct (ID), Intensity Histogram (IH), and Neighbor Intensity Difference (NID).

Utilizing radiomic features offers considerable potential to capture tumor heterogeneity and detailed phenotypic information. However, the efficacy of radiomic studies, especially in the context of extensive cross-institutional collaborations, is significantly hindered by the lack of standardization in medical image acquisition practices[8, 7].

II-C CT Image Standardization Approaches

In general, There are two types of CT image standardization approaches, each serving distinct purposes and contingent upon data availability.The first category, known as intra-scanner image standardization, necessitates the availability of paired image data[13]. In this scenario, two images constructed from the same scan but employing different reconstruction kernels constitute an image pair, where the source image refers to the image constructed with the non-standard kernel (e.g., Siemens Br40), and the target image is constructed using the standard kernel (e.g., Siemens Bl64). Given paired image data as the training data, a machine learning model is trained to convert source images to target images.The second category for CT image standardization encompasses models devised for cross-scanner image standardization, which eliminates the need for paired image data[14]. In this setting, images are not required to be matched; rather, images acquired with different protocols are stored separately.

Acquiring paired training data is straightforward, though it is predominantly confined to a single scanner. In large-scale radiomic studies, the need for standardization is more pronounced in cross-vendor scenarios, which cannot be accomplished by utilizing models from the first category. To address the issue of cross-vendor image standardization, models in the second category mitigate the requirement for paired images, albeit at the cost of reduced performance.

Liang et al[23] developed a CT image standardization model, denoted as GANai, based on conditional Generative Adversarial Network (cGAN)[24]. A new alternative training strategy was designed to effectively learn the data distribution. GANai achieved better performance in comparison to cGAN and the traditional histogram matching approach[25]. However, GANai primarily focuses on the less challenging task of image patch synthesis rather than addressing the entire DICOM image synthesis problem.

Selim et al[13] introduced another cGAN-based CT image standardization model, denoted as STAN-CT. In STAN-CT, a complete pipeline for systematic CT image standardization was constructed. Also, a new loss function was devised to account for two constraints, i.e., latent space loss and feature space loss. The latent space loss is adopted for the generator to establish a one-to-one mapping between standard and synthesized images. The feature space loss is utilized by the discriminator to critique the texture features of the standard and the synthesized images. Nevertheless, STAN-CT was limited by the limited availability of training data and was evaluated at the image patch level on a limited number of texture features, utilizing only a single evaluation criterion.

RadiomicGAN, another GAN-based model, incorporates a transfer learning approach to address the data limitation issue[15]. The model is designed using a pre-trained VGG network. A novel training technique called window training is implemented to reconcile the pixel intensity disparity between the natural image domain and the CT imaging domain. Experimental results indicated that RadiomicGAN outperformed both STAN-CT and GANai.

For cross-scanner image standardization, a model termed CVH-CT was developed[14]. CVH-CT aims to standardize images between scanners from different manufacturers, such as Siemens and GE. The generator of CVH-CT employs a self-attention mechanism for learning scanner-related information. A VGG feature-based domain loss is utilized to extract texture properties from unpaired image data, enabling the learning of scanner-based texture distributions. Experimental results show that, in comparison to CycleGAN[26], CVH-CT enhanced feature discrepancy in the synthesized images, but its performance is not significantly improved when compared with models trained within the intra-scanner domain.

UDA-CT, a recently developed deep learning model for CT image standardization, demonstrates a departure from previous methods by incorporating both paired and unpaired images, rendering it more flexible and robust[27]. UDA-CT effectively learns a mapping from all non-standard distributions to the standard distribution, thereby enhancing the modeling of the global distribution of all non-standard images. Notably, UDA-CT demonstrates compatible performance in both within-scanner and cross-scanner settings.

The development of standardization models for CT images has provided a solid foundation for generating stable radiomic features in large-scale studies. However, recent advances in image synthesis using diffusion models have opened up new opportunities for investigating the CT image standardization problem. These models offer a powerful approach for generating high-quality, standardized images from diverse sources, which could greatly improve the accuracy and reliability of radiomic studies. By leveraging the strengths of both standardization and synthesis models, researchers may be able to unlock new insights into the relationship between CT images and disease outcomes.

III Method

The structure of DiffusionCT is shown in Fig2, encompassing two major components: the image embedding component and a conditional DDPM in the latent space. The image embedding component employs an encoder-decoder network to translate input CT images to a low-dimensional latent representation. Subsequently, the conditional DDPM models the conditional probability distribution of the latent representation in order to synthesize a standard image. Importantly, DiffusionCT retains the original structure of the input image while effectively standardizing its texture.

DiffusionCT is trained sequentially in three steps. First, in the pre-processing step, the encoder-decoder network is trained with all CT images in the training set, irrespective of whether they are standard or non-standard or whether they are captured using GE or Siemens. This step aims to effectively encode images into a 1-D latent vector, which can reconstruct the original image with minimal information loss. Second, a latent conditional DDPM is trained with image pairs, consisting of a non-standard image and its corresponding standard image. This step enables the DDPM to model the conditional probability distribution of the latent representation, thus facilitating the synthesis of standard images. Finally, all the trained neural networks are combined to standardize new images.

III-A Image encoding and decoding

The image embedding component of DiffusionCT comprises a customized U-Net structured convolutional network, designed to learn a low-dimensional latent representation of input images. The encoder and decoder of the U-Net are asymmetric.The encoder uses a pre-trained ResNet-18 with four neural blocks. The first convolutional block consists of the first three ResNet-18 layers. The second block consists of the fourth and fifth layers of ResNet-18. The 3rd, 4th, and 5th blocks of the encoder consist of the corresponding 5th, 6th, and 7th layers of ResNet-18, respectively. The decoder encompasses a five-block convolutional network with up-sampling and several 1D convolutional layers in the last layers. Skip connection is not used within the 1D convolutional layers.

This novel U-Net is trained with all available images in the training dataset, irrespective of whether they are standard or non-standard, in order to learn a global image encoding. The anatomic loss is adopted to facilitate the learning of structural information within the images. The trained U-Net encodes an input image into a latent low-dimensional representation, and the decoder accepts a latent representation to reconstruct the input image. This step is applicable for both intra-scanner and cross-vendor image standardization. The L2-regularized loss function is adopted for model training.

III-B Conditional latent DDPM

Latent Diffusion Model for Medical Image Standardization and Enhancement (3)

In the context of intra-scanner image standardization, paired image data are provided, consisting of a non-standard image A𝐴Aitalic_A and the corresponding standard image B𝐵Bitalic_B. Using the previously described trained encoder, latent embeddings ZAsubscript𝑍𝐴Z_{A}italic_Z start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT and ZBsubscript𝑍𝐵Z_{B}italic_Z start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT are generated from non-standard (A𝐴Aitalic_A) and standard (B𝐵Bitalic_B) images, respectively. As ZAsubscript𝑍𝐴Z_{A}italic_Z start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT and ZBsubscript𝑍𝐵Z_{B}italic_Z start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT adhere to distinct distributions, a conditional latent DDPM is designed to map the non-standard latent distribution to the standard latent distribution. The encoder-decoder network remains unaltered during diffusion training. A well-trained conditional latent DDPM preserves anatomic details in ZAsubscript𝑍𝐴Z_{A}italic_Z start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT while mapping texture details from ZAsubscript𝑍𝐴Z_{A}italic_Z start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT to ZBsubscript𝑍𝐵Z_{B}italic_Z start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT.

Structure-wise, the conditional latent DDPM includes multiple small steps of diffusion in each training step. In every individual diffusion step, Gaussian noise η𝜂\etaitalic_η is added to the latent embedding ZBsubscript𝑍𝐵Z_{B}italic_Z start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT. All the corrupted ZBsubscript𝑍𝐵Z_{B}italic_Z start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT conditioned to ZAsubscript𝑍𝐴Z_{A}italic_Z start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT are used to train the conditional latent DDPM described in Figure3. For a significant large T𝑇Titalic_T, where T𝑇Titalic_T represents the total number of diffusion steps, t1T(ZBt+η)superscriptsubscriptproduct𝑡1𝑇subscript𝑍subscript𝐵𝑡𝜂\prod_{t-1}^{T}(Z_{B_{t}}+\eta)∏ start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_η ) converges to an isotropic Gaussian distribution.

The network structure of the conditional latent DDPM is a U-Net, which is trained to predict the added noise η𝜂\etaitalic_η from t1T(ZBt+η)superscriptsubscriptproduct𝑡1𝑇subscript𝑍subscript𝐵𝑡𝜂\prod_{t-1}^{T}(Z_{B_{t}}+\eta)∏ start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_η ). In addition to the standard diffusion loss function (see details at Ho et al[16]), an L1-loss between the reconstructed and the non-standard embeddings =𝔼t[1,T][|ηtpθ(ZA,ZBt)|]subscript𝔼similar-to𝑡1𝑇delimited-[]subscript𝜂𝑡subscript𝑝𝜃subscript𝑍𝐴subscript𝑍subscript𝐵𝑡\mathcal{L}=\mathbb{E}_{t\sim[1,T]}[|\eta_{t}-p_{\theta}(Z_{A},Z_{B_{t}})|]caligraphic_L = blackboard_E start_POSTSUBSCRIPT italic_t ∼ [ 1 , italic_T ] end_POSTSUBSCRIPT [ | italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) | ] is used to update the diffusion model. After training, for each non-standard embedding ZAsubscript𝑍𝐴Z_{A}italic_Z start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT, the model synthesizes a latent standardized embedding ZAsubscript𝑍superscript𝐴Z_{A^{\prime}}italic_Z start_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT.

III-C Model training

To ensure effective training, we consider a two-step strategy, i.e., representation learning and latent diffusion training. In representation learning, we train the customized U-Net with all training images to learn the latent low-dimensional representation. Specifically, the network introduced in III-A is trained to learn the global data representation of all training images in the latent space. After the encoder and decoder are well trained, they remain fixed, and the latent diffusion model training starts.In the latent diffusion training process, we train the proposed conditional latent DDPM introduced inIII-A to map the latent representation of non-standardized images to the standard image domain.

The trained encoder-decoder network and conditional latent DDPM are integrated for image standardization. A non-standard image A𝐴Aitalic_A is passed through the trained encoder to convert it into a latent representation ZAsubscript𝑍𝐴Z_{A}italic_Z start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT. Then, ZAsubscript𝑍𝐴Z_{A}italic_Z start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT is passed through the trained conditional latent DDPM to generate ZAsubscript𝑍superscript𝐴Z_{A^{\prime}}italic_Z start_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, which falls into the standard embedding domain. Finally, ZAsubscript𝑍superscript𝐴Z_{A^{\prime}}italic_Z start_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is passed through the trained decoder to synthesize image Asuperscript𝐴A^{\prime}italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT in the standard image domain B𝐵Bitalic_B.

IV Experimental Results

DiffusionCT was built using the PyTorch framework. The network weights were randomly initialized. The learning rate was set to 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT with the Adam optimizer. The encode-decode network underwent training for a duration of 20 epochs, followed by an additional 20 epochs dedicated to training the diffusion network. In total, the model required about 20 hours for complete training from scratch. Once the model was fully trained, it took about 30 seconds to process and synthesize a standardized slice of a DICOM CT image.

We compared DiffusionCT with five recently developed CT image standardization models, including GANai[23], STAN-CT[13], and, RadiomicGAN[15], CVH-CT[14], and UDA-CT[14], as well as the original DDPM and the encoder-decoder network. To evaluate the model performance, the results were measured using two metrics: the concordance correlation coefficient (CCC) and error rate. These metrics allow for a quantitative evaluation of the effectiveness of the proposed method in achieving CT image standardization while preserving the original texture and structure of the images.

IV-A Experimental Data

The training data consist of a total of 9,886 CT image slices from 14 lung cancer patients captured using two different kernels (Br40 and Bl64) and 1mm slice thickness using a Siemens CT Somatom Force scanner at the University of Kentucky Albert B. Chandler Hospital.The training data also contain additional 9,900 image slices from a lungman chest phantom scan, with three synthetic tumors inserted. The phantom is scanned using two different kernels (Br40 and Bl64) and two different slice thicknesses of 1.5mm and 3mm using the same scanner. In total, 19,786 CT image slices were used to train DiffusionCT.To prepare the testing data, the identical lungman chest phantom was used. The testing data comprised 126 CT image slices acquired using two different kernels (Br40 and Bl64) with a Siemens CT Somatom Force scanner. Notably, despite the commonality of the phantom used in obtaining both training and testing data sets, the acquisition of test data with a 5mm slice thickness results in the disjoint nature of the training and testing data.In this experiment, for demonstration purposes, Siemens Bl64 is considered the standard protocol, while Siemens Br40 was regarded as non-standard. Our standardization experiments focus to mitigate reconstruction kernel-related variability.

Latent Diffusion Model for Medical Image Standardization and Enhancement (4)

IV-B Evaluation Metric

Model performance was evaluated based on lung tumors in the CT images. For each tumor, a total of 1,401 radiomic features, from six feature classes (GOH, GLCM, GLRLM, ID, IH, NID), were extracted using IBEX[28].Based on these radiomic features, we evaluated DiffusionCT and all the baseline models using two evaluation metrics, with one-to-one feature comparison and group-wise comparison.

First, the error rate, defined as the relative difference between a synthesized image and its corresponding standard image regarding a radiomic feature, was utilized to calculate the linear distance between the standard and the synthesized images regarding each individual radiomic feature. the error rate ranges from 0 to 1, and is the lower the better.

ErrorRate(s,t)=|ftfs|ft×100%𝐸𝑟𝑟𝑜𝑟𝑅𝑎𝑡𝑒𝑠𝑡subscript𝑓𝑡subscript𝑓𝑠subscript𝑓𝑡percent100ErrorRate(s,t)=\frac{|f_{t}-f_{s}|}{f_{t}}\times 100\%italic_E italic_r italic_r italic_o italic_r italic_R italic_a italic_t italic_e ( italic_s , italic_t ) = divide start_ARG | italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT | end_ARG start_ARG italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG × 100 %(1)

where ftsubscript𝑓𝑡f_{t}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and fssubscript𝑓𝑠f_{s}italic_f start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT are the radiomic feature values of the standard and synthesized image, respectively; and s𝑠sitalic_s and t𝑡titalic_t stand for the standard and the synthesized images, respectively.

Usually, a radiomic feature is considered to be reproducible if the synthesized image is more than 85% similar to the corresponding standard image[29, 30]. Mathematically, a radiomic feature is considered reproducible if and only if ErrorRate(s,t)<15%𝐸𝑟𝑟𝑜𝑟𝑅𝑎𝑡𝑒𝑠𝑡percent15ErrorRate(s,t)<15\%italic_E italic_r italic_r italic_o italic_r italic_R italic_a italic_t italic_e ( italic_s , italic_t ) < 15 %.

Concordance Correlation Coefficient[31] (CCC) was employed to measure the level of similarity between two feature groups[30]. Mathematically, CCC represents the correlation between the standard and the non-standard image features in the radiomic feature class r𝑟ritalic_r. CCC ranges from -1 to 1, and is the higher the better.

CCC(s,t,r)=2ρs,t,rσsσtσs2+σt2+(μsμt)2𝐶𝐶𝐶𝑠𝑡𝑟2subscript𝜌𝑠𝑡𝑟subscript𝜎𝑠subscript𝜎𝑡superscriptsubscript𝜎𝑠2superscriptsubscript𝜎𝑡2superscriptsubscript𝜇𝑠subscript𝜇𝑡2CCC(s,t,r)=\frac{2\rho_{s,t,r}\sigma_{s}\sigma_{t}}{{\sigma_{s}}^{2}+{\sigma_{%t}}^{2}+{(\mu_{s}-\mu_{t})}^{2}}italic_C italic_C italic_C ( italic_s , italic_t , italic_r ) = divide start_ARG 2 italic_ρ start_POSTSUBSCRIPT italic_s , italic_t , italic_r end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( italic_μ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG(2)

where s𝑠sitalic_s and t𝑡titalic_t stand for the standard and the synthesized images, respectively; μssubscript𝜇𝑠\mu_{s}italic_μ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT and σssubscript𝜎𝑠\sigma_{s}italic_σ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT (or μtsubscript𝜇𝑡\mu_{t}italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and σtsubscript𝜎𝑡\sigma_{t}italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT) are the mean and standard deviation of the radiomic features belonging to the same feature class R𝑅Ritalic_R in a synthesized (or standard) image, respectively; and ρs,t,rsubscript𝜌𝑠𝑡𝑟\rho_{s,t,r}italic_ρ start_POSTSUBSCRIPT italic_s , italic_t , italic_r end_POSTSUBSCRIPT is the Pearson correlation coefficient between s𝑠sitalic_s and t𝑡titalic_t regarding a feature class r𝑟ritalic_r.

IV-C Results and Discussion

Feature ClassGOHGLCMGLRLMIDIHNID
Baseline0.90 ±plus-or-minus\pm± 0.050.20 ±plus-or-minus\pm± 0.130.59 ±plus-or-minus\pm± 0.130.33 ±plus-or-minus\pm± 0.160.35 ±plus-or-minus\pm± 0.120.28 ±plus-or-minus\pm± 0.15
GANai0.95 ±plus-or-minus\pm± 0.050.50 ±plus-or-minus\pm± 0.080.63 ±plus-or-minus\pm± 0.120.59 ±plus-or-minus\pm± 0.030.44 ±plus-or-minus\pm± 0.080.65 ±plus-or-minus\pm± 0.10
STAN-CT0.95 ±plus-or-minus\pm± 0.050.70 ±plus-or-minus\pm± 0.100.72 ±plus-or-minus\pm± 0.150.75 ±plus-or-minus\pm± 0.160.61 ±plus-or-minus\pm± 0.110.71 ±plus-or-minus\pm± 0.05
RadiomicGAN1.00 ±plus-or-minus\pm± 0.000.80 ±plus-or-minus\pm± 0.120.75 ±plus-or-minus\pm± 0.110.82 ±plus-or-minus\pm± 0.080.72 ±plus-or-minus\pm± 0.090.73 ±plus-or-minus\pm± 0.12
Encoder-Decoder1.00 ±plus-or-minus\pm± 0.000.38 ±plus-or-minus\pm± 0.190.61 ±plus-or-minus\pm± 0.150.52 ±plus-or-minus\pm± 0.110.39 ±plus-or-minus\pm± 0.250.33 ±plus-or-minus\pm± 0.09
DDPM1.00 ±plus-or-minus\pm± 0.000.81 ±plus-or-minus\pm± 0.230.80 ±plus-or-minus\pm± 0.180.85 ±plus-or-minus\pm± 0.150.77 ±plus-or-minus\pm± 0.120.82 ±plus-or-minus\pm± 0.13
DiffusionCT1.00 ±plus-or-minus\pm± 0.000.85 ±plus-or-minus\pm± 0.140.79 ±plus-or-minus\pm± 0.210.89 ±plus-or-minus\pm± 0.280.41 ±plus-or-minus\pm± 0.050.86 ±plus-or-minus\pm± 0.18

In Figure4, each point on a line represents the total number of radiomic features on the y-axis whose respective error rate is equal to or smaller than the value specified on the x-axis.The red line represents the direct comparison of the input images and the corresponding standard images without using any algorithms. The green, blue, and black lines represent the performance of the encoder-decoder network, DDPM, and DiffusionCT model, respectively. In the literature, the compared models’ performances were reported based on a 15% error rate. In figure4, the model performance on ErrorRate0.15𝐸𝑟𝑟𝑜𝑟𝑅𝑎𝑡𝑒0.15ErrorRate\leq 0.15italic_E italic_r italic_r italic_o italic_r italic_R italic_a italic_t italic_e ≤ 0.15 showed that DiffusionCT preserved 64% and DDPM preserved 58% more radiomic features than the baseline, comparing to GANai at 20%, STAN-CT at 32%, and RadiomicGAN at 51%.

TableI shows the CCC scores of six classes of radiomic features. The performance of the baseline was measured using the input images. In four out of six feature classes, DiffusionCT achieved CCC>0.85𝐶𝐶𝐶0.85CCC>0.85italic_C italic_C italic_C > 0.85, clearly outperforming all the compared models. Nevertheless, DDPM outperformed DiffusionCT and other compared models in two other feature groups. Notably, GLCM and GLRLM together occupy almost 50% of the total number of radiomic features, and both the DDPM and our DiffusionCT achieved significant performance gains. Also, DDPM had the highest variation on GLCM, indicating conditional DDPM could be more suitable for the image standardization task

Latent Diffusion Model for Medical Image Standardization and Enhancement (5)

Figure5 visualizes the results of all compared models on a sample tumor. The input tumor image is observably different from the standard image regarding visual appearances as well as radiomic features. The DiffusionCT-generated image has the highest CCC values regarding GLCM in reference to the standard image and is visually more similar to the standard image than the ones generated by GAN-based models and the vanilla DDPM.

IV-D Case study on TR-LSCI image denoising

Latent Diffusion Model for Medical Image Standardization and Enhancement (6)
Latent Diffusion Model for Medical Image Standardization and Enhancement (7)

Tissue-simulating phantoms with empty channels bearing the University of Kentucky logo (‘UK’) were used to illustrate the fundamental concept of TR-LSCI (Figure 6a-6c). The UK phantom consisted of water, Intralipid particles, and India ink (Black India, MA) while the solid phantom was prepared by resin, India ink, and titanium dioxide (TiO2). TR-LSCI illuminates picosecond-pulsed, coherent, widefield near-infrared light (785 nm) onto the phantom and synchronizes a gated single-photon avalanche diode (SPAD) camera to image flow distributions at different depths. See details of TR-LSCI principle and the design of the UK phantom at Fathi et al[32].

The SPAD camera’s raw intensity images were taken at the depth of 1mm with different gate numbers. The gated intensity images were then converted to a speckle contrast image based on LSCI analysis: Ks=σs<I>subscript𝐾𝑠subscript𝜎𝑠expectation𝐼K_{s}=\frac{\sigma_{s}}{<I>}italic_K start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = divide start_ARG italic_σ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_ARG start_ARG < italic_I > end_ARG, where Kssubscript𝐾𝑠K_{s}italic_K start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT is defined as the ratio of the standard deviation to mean intensity in a pixel window of 3x3 (Figure6d). A flow index can be approximated as the inverse square of the speckle contrast: BFI1/Ks2similar-to𝐵𝐹𝐼1superscriptsubscript𝐾𝑠2BFI\sim 1/K_{s}^{2}italic_B italic_F italic_I ∼ 1 / italic_K start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.Figure6e-6f show the results using the TR-LSCI to image the UK logo phantoms. These results are expected as deeper penetration and thicker top layer resulted in fewer diffused photons being detected.

DiffusionCT was trained to reduce TR-LSCI image noises. Image with a high noise rate obtained using TR-LSCI was paired with the corresponding phantom shape image (Figure6c) considered the ground truth. Left-half of the phantom image (n=7,201) was used to train the DiffusionCT model and right-half was used to test the model performance (n=7,201).

Results on the UK logo phantom are shown in Figure7c-7e. The resulting image preserves the structural information and contains much less noise than the input. The results were evaluated using the structural similarity index measure (SSIM), concordance correlation coefficient (CCC), and peak SNR (PSNR). The synthesized image (Figure7d), compared to the input image (Figure7c), has improved SSIM from 0.44 to 0.77, PSNR from 12.50 to 23.75, and CCC from -0.01 to 0.86, where all the measurements were computed in reference to the ground truth (Figure7e).

V Conclusion

Image standardization reduces texture feature variations and improves the reliability of radiomic features of CT imaging. The existing CT image standardization models were mainly developed based on GAN. This article accesses the application DDPM approach for the CT image standardization task. Both image space and latent space have been investigated in relation to DDPM. The experimental results indicate that DDPM-based models are significantly better than GAN-based models. The DDPM has comparable performance in image space and latent space. Owing to its relatively compact size, DiffusionCT is best suited for creating more abstract embeddings in the target domain.

In this study, we have adopted a ResNET-18-based encoder as it is a widely used CNN architecture. The future research direction includes the comparison with other available architectures, e.g., VGG, and vanilla U-Net. Besides network architecture, the future scope of this study includes experiments with larger and patient datasets.

References

  • [1]J.Collins, “Letter from the editor: Lung cancer screening facts,” in Seminars in roentgenology, vol.52, no.3, 2017, pp. 121–122.
  • [2]H.J. DeKoning, R.Meza, S.K. Plevritis, K.TenHaaf, V.N. Munshi, J.Jeon, S.A. Erdogan, C.Y. Kong, S.S. Han, J.VanRosmalen etal., “Benefits and harms of computed tomography lung cancer screening strategies: a comparative modeling study for the us preventive services task force,” Annals of internal medicine, vol. 160, no.5, pp. 311–320, 2014.
  • [3]M.Ravanelli, D.Farina, M.Morassi, E.Roca, G.Cavalleri, G.Tassi, and R.Maroldi, “Texture analysis of advanced non-small cell lung cancer (nsclc) on contrast-enhanced computed tomography: prediction of the response to the first-line chemotherapy,” European radiology, vol.23, pp. 3450–3455, 2013.
  • [4]D.Ardila, A.P. Kiraly, S.Bharadwaj, B.Choi, J.J. Reicher, L.Peng, D.Tse, M.Etemadi, W.Ye, G.Corrado etal., “End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography,” Nature medicine, vol.25, no.6, pp. 954–961, 2019.
  • [5]Q.Song, L.Zhao, X.Luo, and X.Dou, “Using deep learning for classification of lung nodules on computed tomography images,” Journal of healthcare engineering, vol. 2017, 2017.
  • [6]M.T. Lu, V.K. Raghu, T.Mayrhofer, H.J. Aerts, and U.Hoffmann, “Deep learning using chest radiographs to identify high-risk smokers for lung cancer screening computed tomography: development and validation of a prediction model,” Annals of Internal Medicine, vol. 173, no.9, pp. 704–713, 2020.
  • [7]R.Berenguer, M.d.R. Pastor-Juan, J.Canales-Vázquez, M.Castro-García, M.V. Villas, F.M. Legorburo, and S.Sabater, “Radiomics of ct features may be nonreproducible and redundant: Influence of ct acquisition parameters,” Radiology, p. 172361, 2018.
  • [8]L.A. Hunter, S.Krafft, F.Stingo, H.Choi, M.K. Martel, S.F. Kry, and L.E. Court, “High quality machine-robust image features: Identification in nonsmall cell lung cancer computed tomography images,” Medical physics, vol.40, no.12, 2013.
  • [9]J.Paul, B.Krauss, R.Banckwitz etal., “Relationships of clinical protocols and reconstruction kernels with image quality and radiation dose in a 128-slice ct scanner: study with an anthropomorphic and water phantom,” European journal of radiology, vol.81, no.5, pp. e699–e703, 2012.
  • [10]D.S. Gierada, A.J. Bierhals, C.K. Choong, S.T. Bartel, J.H. Ritter, N.A. Das, C.Hong, T.K. Pilgram, K.T. Bae, B.R. Whiting etal., “Effects of ct section thickness and reconstruction kernel on emphysema quantification: relationship to the magnitude of the ct emphysema index,” Academic radiology, vol.17, no.2, pp. 146–156, 2010.
  • [11]G.Liang, J.Zhang, M.Brooks, J.Howard, and J.Chen, “radiomic features of lung cancer and their dependency on ct image acquisition parameters,” Medical Physics, vol.44, no.6, p. 3024, 2017.
  • [12]M.F. Cohen and J.R. Wallace, Radiosity and realistic image synthesis.Elsevier, 2012.
  • [13]M.Selim, J.Zhang, B.Fei, G.-Q. Zhang, and J.Chen, “Stan-ct: Standardizing ct image using generative adversarial network,” in AMIA Annual Symposium Proceedings, vol. 2020.American Medical Informatics Association, 2020.
  • [14]M.Selim, J.Zhang, B.Fei, and et. al., “Cross-vendor ct image data harmonization using cvh-ct,” in AMIA Annual Symposium Proceedings, vol. 2021.American Medical Informatics Association, 2021, p. 1099.
  • [15]M.Selim, J.Zhang, and et. al., “CT image harmonization for enhancing radiomics studies,” in 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2021, pp. 1057–1062.
  • [16]J.Ho, A.Jain, and P.Abbeel, “Denoising diffusion probabilistic models,” Advances in Neural Information Processing Systems, vol.33, pp. 6840–6851, 2020.
  • [17]C.Saharia, J.Ho, W.Chan, T.Salimans, D.J. Fleet, and M.Norouzi, “Image super-resolution via iterative refinement,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  • [18]C.Saharia, W.Chan, H.Chang, C.Lee, J.Ho, T.Salimans, D.Fleet, and M.Norouzi, “Palette: Image-to-image diffusion models,” in ACM SIGGRAPH 2022 Conference Proceedings, 2022, pp. 1–10.
  • [19]Q.Yang, P.Yan, Y.Zhang, H.Yu, Y.Shi, X.Mou, M.K. Kalra, Y.Zhang, L.Sun, and G.Wang, “Low-dose ct image denoising using a generative adversarial network with wasserstein distance and perceptual loss,” IEEE transactions on medical imaging, vol.37, no.6, pp. 1348–1357, 2018.
  • [20]S.S. Yip and H.J. Aerts, “Applications and limitations of radiomics,” Physics in Medicine & Biology, vol.61, no.13, p. R150, 2016.
  • [21]X.Yang and M.V. Knopp, “Quantifying tumor vascular heterogeneity with dynamic contrast-enhanced magnetic resonance imaging: a review,” BioMed Research International, vol. 2011, 2011.
  • [22]S.Basu, T.C. Kwee, R.Gatenby, B.Saboury, D.A. Torigian, and A.Alavi, “Evolving role of molecular imaging with pet in detecting and characterizing heterogeneity of cancer tissue at the primary and metastatic sites, a plausible explanation for failed attempts to cure malignant disorders,” 2011.
  • [23]G.Liang, S.Fouladvand, J.Zhang, M.A. Brooks, N.Jacobs, and J.Chen, “Ganai: Standardizing ct images using generative adversarial network with alternative improvement,” in 2019 IEEE International Conference on Healthcare Informatics (ICHI).IEEE, 2019, pp. 1–11.
  • [24]P.Isola, J.-Y. Zhu, T.Zhou, and A.A. Efros, “Image-to-image translation with conditional adversarial networks,” in Computer Vision and Pattern Recognition (CVPR), 2017.
  • [25]R.C. Gonzalez and R.E. Woods, Digital image processing.Upper Saddle River, NJ: Prentice Hall, 2012.
  • [26]J.-Y. Zhu, T.Park, P.Isola, and A.A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2242–2251.
  • [27]M.Selim, J.Zhang, and et. al., “UDA-CT: A general framework for ct image standardization,” in 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2022.
  • [28]L.Zhang, D.V. Fried, X.J. Fave, and et. al., “Ibex: an open infrastructure software platform to facilitate collaborative work in radiomics,” Medical physics, vol.42, no.3, pp. 1341–1353, 2015.
  • [29]B.Zhao, Y.Tan, W.-Y. Tsai, J.Qi, C.Xie, L.Lu, and L.H. Schwartz, “Reproducibility of radiomics for deciphering tumor phenotype with imaging,” Scientific reports, vol.6, no.1, pp. 1–7, 2016.
  • [30]J.Choe, S.M. Lee, K.-H. Do, G.Lee, J.-G. Lee, S.M. Lee, and J.B. Seo, “Deep learning–based image conversion of ct reconstruction kernels improves radiomics reproducibility for pulmonary nodules or masses,” Radiology, vol. 292, no.2, pp. 365–373, 2019.
  • [31]I.Lawrence and K.Lin, “A concordance correlation coefficient to evaluate reproducibility,” Biometrics, pp. 255–268, 1989.
  • [32]F.Fathi, S.Mazdeyasna, D.Singh, C.Huang, M.Mohtasebi, X.Liu, S.R. Haratbar, M.Zhao, L.Chen, A.C. Ulku etal., “Time-resolved laser speckle contrast imaging (tr-lsci) of cerebral blood flow,” arXiv preprint arXiv:2309.13527, 2023.
Latent Diffusion Model for Medical Image Standardization and Enhancement (2025)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Lakeisha Bayer VM

Last Updated:

Views: 6122

Rating: 4.9 / 5 (49 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Lakeisha Bayer VM

Birthday: 1997-10-17

Address: Suite 835 34136 Adrian Mountains, Floydton, UT 81036

Phone: +3571527672278

Job: Manufacturing Agent

Hobby: Skimboarding, Photography, Roller skating, Knife making, Paintball, Embroidery, Gunsmithing

Introduction: My name is Lakeisha Bayer VM, I am a brainy, kind, enchanting, healthy, lovely, clean, witty person who loves writing and wants to share my knowledge and understanding with you.