Paper

CBCT correction using a cycle-consistent generative adversarial network and unpaired training to enable photon and proton dose calculation

, , , , , , , , and

Published 15 November 2019 © 2019 Institute of Physics and Engineering in Medicine
, , Citation Christopher Kurz et al 2019 Phys. Med. Biol. 64 225004 DOI 10.1088/1361-6560/ab4d8c

0031-9155/64/22/225004

Abstract

In presence of inter-fractional anatomical changes, clinical benefits are anticipated from image-guided adaptive radiotherapy. Nowadays, cone-beam CT (CBCT) imaging is mostly utilized during pre-treatment imaging for position verification. Due to various artifacts, image quality is typically not sufficient for photon or proton dose calculation, thus demanding accurate CBCT correction, as potentially provided by deep learning techniques.

This work aimed at investigating the feasibility of utilizing a cycle-consistent generative adversarial network (cycleGAN) for prostate CBCT correction using unpaired training. Thirty-three patients were included. The network was trained to translate uncorrected, original CBCT images (CBCTorg) into planning CT equivalent images (CBCTcycleGAN). HU accuracy was determined by comparison to a previously validated CBCT correction technique (CBCTcor). Dosimetric accuracy was inferred for volumetric-modulated arc photon therapy (VMAT) and opposing single-field uniform dose (OSFUD) proton plans, optimized on CBCTcor and recalculated on CBCTcycleGAN. Single-sided SFUD proton plans were utilized to assess proton range accuracy.

The mean HU error of CBCTcycleGAN with respect to CBCTcor decreased from 24 HU for CBCTorg to  −6 HU. Dose calculation accuracy was high for VMAT, with average pass-rates of 100%/89% for a 2%/1% dose difference criterion. For proton OSFUD plans, the average pass-rate for a 2% dose difference criterion was 80%. Using a (2%, 2 mm) gamma criterion, the pass-rate was 96%. 93% of all analyzed SFUD profiles had a range agreement better than 3 mm. CBCT correction time was reduced from 6–10 min for CBCTcor to 10 s for CBCTcycleGAN.

Our study demonstrated the feasibility of utilizing a cycleGAN for CBCT correction, achieving high dose calculation accuracy for VMAT. For proton therapy, further improvements may be required. Due to unpaired training, the approach does not rely on anatomically consistent training data or potentially inaccurate deformable image registration. The substantial speed-up for CBCT correction renders the method particularly interesting for adaptive radiotherapy.

Export citation and abstract BibTeX RIS

1. Introduction

Modern external beam radiotherapy techniques like intensity-modulated photon radiotherapy (IMRT), volumetric-modulated arc therapy (VMAT) or intensity-modulated proton therapy (IMPT) using pencil-beam scanning (PBS) promise highly accurate dose delivery to the target volume while concurrently sparing adjacent organs-at-risk (OAR). However, in many cases the potential of these techniques cannot be fully exploited due to the presence of inter-fractional anatomical changes occurring between acquisition of the planning imaging data and the actual treatment day. Currently, such anatomical changes are managed by the introduction of safety margins around the actual target volume (van Herk 2004). On the one hand, the use of safety margins ensures homogeneous delivery of the prescribed dose to the tumor in the presence of inter-fractional motion, on the other hand it leads to a substantial increase in the dose burden to neighboring OARs, which eventually limits the dose that is applicable to the tumor without the risk of severe side-effects.

Various approaches have been described in the literature for CBCT intensity correction, aiming at making the acquired pre-treatment imaging data, which is nowadays solely used for accurate patient alignment, suitable for accurate dose calculation and treatment adaptation. This could eventually enable accurate tailoring of the applied dose to the daily anatomy and shrinking of the applied safety margins. CBCT correction techniques range from simple look-up-table or histogram-matching based solutions (Kurz et al 2015, Kidar and Azizi 2018), over the use of planning CT (pCT)-to-CBCT deformable image registration (DIR), yielding a so-called virtual CT (vCT) (Peroni et al 2012, Landry et al 2014, 2015, Veiga et al 2014, 2015, 2016, Wang et al 2016), to the application of Monte-Carlo (MC) based methods for estimating and correcting the detected scatter contribution (Mainegra-Hing and Kawrakow 2010, Thing et al 2016, Zöllner et al 2017). Correction methods have been proposed and investigated in the context of photon (Ding et al 2007, Fotina et al 2012, Niu et al 2012, Veiga et al 2014) as well as in the context of proton radiotherapy (Landry et al 2015, Veiga et al 2015, 2016, Kurz et al 2016a) and for various treatment sites, from head and neck to the lung or the prostate. While most publications report promising results allowing for reasonably accurate CBCT based dose calculation, they also discuss miscellaneous short-comings. For example, while DIR based methods yielded promising results for the head and neck region for photons and protons (Kurz et al 2015, Landry et al 2015), accuracy was limited in the pelvis due to more pronounced anatomical changes from fraction to fraction. In such scenarios, improved results were obtained when using the vCT only as prior to estimate detected low frequency deviations, such as scatter and beam hardening, and to perform a projection based CBCT correction (Niu et al 2010, 2012, Park et al 2015, Kurz et al 2016b). While the latter method demonstrated reduced sensitivity to DIR inaccuracies, generating a corrected CBCT image typically takes in the order of several minutes. In the scope of an adaptive workflow, this would not be acceptable since, e.g. for the prostate, anatomical changes can occur on shorter time scales of only few minutes (Langen et al 2008, McPartlin et al 2016). For the same reason, MC based scatter correction techniques, taking in the order of hours, are not suitable for clinical application in adaptive radiotherapy despite their high accuracy.

A considerable increase in speed for CBCT intensity correction has recently been achieved by the application of deep convolutional neural networks (CNN) (Lecun et al 1998). The favored network structure utilized for CBCT correction in the literature is a U-shaped CNN (U-Net), which features various convolutional and deconvolutional layers in an encoding and decoding branch, as initially suggested by Ronneberger et al (2015). U-Nets have been applied to a variety of image-to-image translation tasks and were successfully applied for CBCT correction in various scenarios: in Kida et al (2018), a U-Net was used for translating the CBCT into a pCT equivalent image by training with CBCT and vCT imaging data as input and target, respectively. Maier et al (2018, 2019) trained a U-Net for fast prediction of the detected scatter by utilizing MC simulated scatter distributions for training. The network prediction of the scatter can then be used for projection based CBCT correction in a second step. Projection based CBCT correction was also performed by Hansen et al (2018) using corrected projections retrieved with a previously validated algorithm based on a vCT prior. The U-Net was then trained to translate measured projections into corrected projections. In a follow-up study, the same U-Net was not only successfully trained for projection based correction, but also for translating the reconstructed uncorrected CBCT image into a vCT equivalent image or into a reference corrected CBCT (Landry et al 2019). In all scenarios, the latter study found high dose calculation accuracy for photon and proton treatment plans. Besides U-Nets, generative adversarial networks (GAN) (Goodfellow et al 2014) have shown promising results for image-to-image translation tasks. Conceptually, GANs consist of a generator and a discriminator network, which are trained jointly with an adversarial loss function. While the generator (typically a U-shaped CNN) aims at generating images realistic enough to fool the discriminator, the discriminator (typically a classification CNN) aims at distinguishing the generated fake images from real (training) images. In the context of medical imaging, GANs have, among others, been applied to generate synthetic CT images from MRI data (Wolterink et al 2017, Maspero et al 2018, Lei et al 2019) and, more recently, to translate CBCTs into pCT-like images of the brain and the pelvis using paired training (Harms et al 2019), as well as of the head and neck region using unpaired training (Liang et al 2019). The latter work also included dosimetric analysis of the GAN based corrected CBCT images, reporting high dose calculation accuracy for photon therapy but not investigating accuracy in the scope of proton therapy.

In this work, we aimed at investigating for the first time the feasibility of utilizing a so-called cycle-consistent GAN (cycleGAN) (Zhu et al 2017) for prostate CBCT correction in photon and proton radiotherapy. The network aimed at translating an uncorrected original CBCT (CBCTorg) into a pCT equivalent image, referred to as CBCTcycleGAN. The dedicated design of the cycleGAN, in combination with the applied loss function enabled the use of unpaired training data, in this case the daily CBCT and the initial pCT. While previous approaches utilizing U-Nets rely on a precise geometric relation between input and target data, which are compared voxel-by-voxel in the loss functions, this is not the case for the cycleGAN. Thus, it is feasible to use data which features substantial inter-scan differences or was even obtained from different patient cohorts without previous DIR for matching the data. The possibility of using unpaired training data is deemed particularly beneficial in situations, where these data are considerably affected by inter-fractional anatomical changes, e.g. in the prostate region, since any anatomical mismatch between input and target data can affect the accuracy of the trained network model. Accuracy of CBCTcycleGAN was quantified in terms of Hounsfield unit (HU) mean (absolute) error in comparison to a previously validated CBCT correction strategy (CBCTcor). Dose calculation accuracy for photon and proton radiotherapy was evaluated by means of various dose difference metrics and clinically relevant dose-volume-histogram (DVH) parameters, comparing CBCTcycleGAN to CBCTcor. The proton range on both CBCT data-sets was compared.

2. Materials and methods

2.1. Patient data

pCT and CBCT imaging data of 33 prostate cancer patients originally treated with VMAT to a total dose of 70 Gy–76 Gy in 2 Gy fractions at the Department of Radiation Oncology of the University Hospital of the LMU Munich were included in this study. All pCTs were acquired with a Toshiba Acquilion LB CT scanner (Canon Medical Systems, Japan). Tube voltage was set to 120 kV. An image grid of 1.1 mm $\times$ 1.1 mm $\times$ 3.0 mm was used in combination with a 55 cm lateral field of view (FOV). For each patient, a CBCT image was acquired in treatment position using the XVI system (version 5.0.2) of a Synergy medical linear accelerator (Elekta, Sweden). All CBCTs were acquired using a scan protocol with 120 kV tube voltage and an exposure time of 20 ms at an x-ray tube current of 20 mA per projection. These parameters were chosen in order to avoid saturation of the detector panel and enable accurate determination of the patient outline. A bow-tie filter was used in combination with a shifted detector panel (M position) to enlarge the lateral FOV. Patient data suffering from severe lateral truncation despite the enlarged FOV were not considered in this study. Between 346 and 357 projections were acquired in a 360° scan. The uncorrected CBCT (CBCTorg) was reconstructed using the FDK (Feldkamp–Davis–Kress) implementation of RTK (Reconstruction ToolKit, Rit et al (2014)) on a 1.0 mm $\times$ 1.0 mm $\times$ 1.0 mm grid with 410 $\times$ 410 $\times$ 264 voxels. For training, all CBCTorg were re-binned to a 1.0 mm $\times$ 1.0 mm $\times$ 3.0 mm grid to match the resolution of the pCT in superior–inferior direction. pCT and CBCTorg of each patient were aligned using a rigid translation.

OARs and target structures were delineated by a trained physician on the pCTs. Positioning errors and inter-fractional anatomical changes in the prostate region were accounted for by applying a 7 mm clinical target volume (CTV) to planning target volume (PTV) margin. All contours were transferred to the CBCT via DIR for data analysis.

2.2. cycleGAN network design and training

For CBCT intensity correction, a cycleGAN was investigated in this work. The network architecture was initially proposed and implemented by Zhu et al (2017). The cycleGAN was trained to translate CBCTorg (input) into a pCT equivalent image (output) which, in contrast to CBCTorg, should exhibit accurate quantitative HU values. Training was performed in 2D, i.e. slice-by-slice in the transverse plane. The architecture of the network is illustrated in figure 1. It consists of two GANs that compose a forward and a backward cycle. In the forward cycle, the generator GpCT starts from a given slice of CBCTorg and tries to generate an image (GpCT(CBCTorg)) which looks equivalent to a pCT. The discriminator network DpCT, on the other hand, tries to distinguish the generated synthetic pCT image (GpCT(CBCTorg)) from the real pCT images. DpCT outputs the label 0 for synthetic pCT data and 1 for real pCT data. The generator and the discriminator network are trained using an adversarial loss function:

Equation (1)
Figure 1. Refer to the following caption and surrounding text.

Figure 1. Illustration of the cycleGAN network design. The forward cycle (top) consists of a generator (GpCT) and discriminator (DpCT) network, which are trained using a joint adversarial loss function ($\mathcal{L}_{{\rm pCT}}$ ). In the backward cycle (bottom) pCT and CBCTorg are swapped and GCBCT and DCBCT are jointly trained using $\mathcal{L}_{{\rm CBCT}}$ . Cycle consistency is ensured by the L1 loss functions $\mathcal{L}^{\mathrm{fw}}_{{\rm cyc}}$ and $\mathcal{L}^{\mathrm{bw}}_{{\rm cyc}}$ during training. Eventually, the trained GpCT network was used to translate CBCTorg into a pCT equivalent image (CBCTcycleGAN).

Standard image High-resolution image

During training, GpCT aims at maximizing the expectation value (over all pCTs ($\mathbb{E}_{{\rm pCT}}$ ) and CBCTorg ($\mathbb{E}_{{\rm CBCT}_{{\rm org}}}$ )) of $\mathcal{L}_{{\rm pCT}}$ by generating as realistic as possible synthetic pCT images from CBCTorg, while DpCT tries to minimize $\mathcal{L}_{{\rm pCT}}$ by becoming as good as possible in distinguishing real and synthetic pCT image slices. In the backward cycle, pCT and CBCTorg are swapped and the adversarial loss function is:

Equation (2)

Similar to the forward cycle, GCBCT tries to maximize $\mathcal{L}_{{\rm CBCT}}$ , while DCBCT tries to minimize it.

Besides the adversarial loss functions, a cycle consistency loss, as suggested in Zhu et al (2017), is used during training to enforce consistency of the two mappings from pCT to CBCT space and vice-versa ($\mathrm{G}_{{\rm pCT}}$ and $\mathrm{G}_{{\rm CBCT}}$ ). For this purpose, a loss term $\mathcal{L}_{{\rm cyc}}$ is introduced. In the forward cycle, it compares the output of GCBCT, with a synthetic pCT generated from CBCTorg as input, to the initial CBCTorg using an L1 norm:

Equation (3)

In the backward cycle, the roles of CBCTorg and pCT are again swapped and the corresponding cycle consistency loss function is:

Equation (4)

During training, both generator and discriminator networks are optimized in parallel using the combined loss function $\mathcal{L}_{{\rm cycleGAN}}$ :

Equation (5)

where $\lambda$ is a hyperparameter that was set to 25 in this study. It should be noted that the applied loss function does not rely on a pixel-by-pixel comparison of the respective generator output and a reference training image, which enables training using unpaired data as performed in this study.

Regarding the design of the generator and discriminator networks, we followed the original implementation by Zhu et al (2017), with the only difference that the input was adapted to be a 16-bit grey-scale image. More specifically, for the generators, a nine-blocks residual network (Johnson et al 2016) was employed, consisting of downsampling from 256 $\times$ 256 to 32 $\times$ 32 with a series of three 2D convolutional layers (stride two, kernel size four) followed by linear rectifiers (scalar multiplier of 0.2) and upsampling from 32 $\times$ 32 to 256 $\times$ 256 with a series of three 2D deconvolutional layers (stride two, kernel size four) with linear rectifiers. The residual blocks were composed by a repeated series of padding (reflect mode) and 2D convolutional layers without up or downsampling. For the discriminator, again following the original implementation by Zhu et al, a 70 $\times$ 70 patchGAN was employed with a downsampling scheme from 256 $\times$ 256 to 32 $\times$ 32 by applying four series of leaky rectified linear units (with scalar multiplier of 0.2) followed by 2D convolutional layers. A scalar in the range [0;1] was obtained for each pixel in the final layer of the discriminator (having a receptive field of 70 $\times$ 70 in the input image) and was used for patch-wise classification of the input image as real of fake. Instance normalization (Ulyanov et al 2016) was applied after each 2D convolutional layer. The networks were implemented in Tensorflow (v1.3.0).

For training the network, the registered pCT image slices were resampled to 410 $\times$ 410 pixels to match the resolution of CBCTorg. The treatment table was removed from pCT and CBCTorg, and all pixels outside the body contour (retrieved from a combination of erosion, thresholding, region growing, dilation and filling operations) were set to  −1000 HU. All images were resampled to 286 $\times$ 286 pixels, followed by cropping to 256 $\times$ 256 pixels from a randomly chosen starting pixel (between 0 and 29) in left–right and anterior–posterior direction as data augmentation during training. Additionally, images were randomly flipped in left–right direction. The HU range of the images was clipped to [−1000, 2071] and image intensity was linearly rescaled to 16-bit. Stochastic gradient descendent was used applying an Adam optimizer (Kingma and Ba 2014) with momentum parameters $\beta_1 = 0.5$ and $\beta_2 = 0.999$ . The batch size was set to 1. At each optimization step, slices of different patients were randomly selected (unpaired training). All networks were trained from scratch with a learning rate of 0.0002. This learning rate was kept constant for the first 100 epochs, then the rate was linearly decreased to zero over the next 100 epochs, following the approach of Zhu et al (2017).

Training was performed on a subset of 25 patients using four single folds, each containing 18 patients. Once training was finished, only the generator network GpCT was used for CBCT intensity correction by means of converting a given CBCTorg slice-by-slice into a pCT equivalent image. Since four different folds were used for training the cycleGAN, four different models for GpCT were obtained and applied to the eight remaining test patients, not seen by the network during training. Then the pixel-wise median of the four model's outputs was calculated for generating the final corrected CBCT image, referred to as CBCTcycleGAN in the following. All CBCTcycleGAN were upsampled to the intial CBCT grid of 410 $\times$ 410 pixels with 1 mm $\times$ 1 mm size. In SI direction, the voxel size was 3 mm, similar to the pCT.

2.3. Reference CBCT correction

Since there were substantial anatomical differences between pCT and CBCTorg due to changes in bladder and rectum filling, as well as in patient positioning, the obtained CBCTcycleGAN was not directly compared to the pCT for inferring the accuracy of CBCTcycleGAN. Instead, a reference CBCT correction strategy which had been validated in several previous studies (Park et al 2015, Kurz et al 2016b) was utilized and served as reference for evaluating CBCTcycleGAN for the eight test cases. The reference correction strategy was described in detail in the original publications (Niu et al 2010, 2012), the follow-up studies in a proton specific setting (Park et al 2015, Kurz et al 2016b) and also in the latest publications by our group (Hansen et al 2018, Landry et al 2019). Hence, only the main ideas will be outlined here: the method relies on a vCT prior, obtained from pCT-to-CBCTorg DIR, which is then forward projected according to the geometry of the used CBCT scanner. These forward projections are free of scatter and other low frequency deviations, such as beam hardening. Thus, the contribution of these low frequency deviations in the actually measured CBCTorg projections can be estimated as the difference between the vCT forward projection and the scaled measured projections, plus a generous 2D smoothing filter to account for their low spatial frequency. The estimated low frequency deviations can then be subtracted from the measured projections and the retrieved corrected projections can be reconstructed using the same FDK algorithm and settings as used for reconstructing CBCTorg. The result, in the following referred to as CBCTcor, is a shading corrected CBCT with HU values equivalent to the pCT, but with the same anatomy as CBCTcycleGAN. Similar to CBCTorg, CBCTcor was reconstructed on a 1.0 mm $\times$ 1.0 mm $\times$ 1.0 mm grid with 410 $\times$ 410 $\times$ 264 voxels and then re-binned to a 1.0 mm $\times$ 1.0 mm $\times$ 3.0 mm grid to match the resolution of CBCTcycleGAN in superior–inferior direction.

2.4. Treatment planning

The CBCTcor of all eight test patients were imported to a research version of a commercial treatment planning system (TPS, RayStation, version 4.99, RaySearch, Sweden). As mentioned before, contours were transferred from pCT to CBCTcor via DIR. The retrieved structures were not further adapted since the accuracy of the derived structures was not critical for our study. VMAT plans were generated for each patient on a 3.0 mm $\times$ 3.0 mm $\times$ 3.0 mm dose grid using one full arc and a collapsed-cone dose engine. The generic Elekta Synergy beam model implemented in the TPS was used. Moreover, PBS proton plans were optimized on the same dose grid, using the pencil beam dose engine and the implemented IBA_Dedicated beam model. Beams from the left and from the right of the patient (90° and 270° gantry angle) were applied to generate opposing single-field uniform dose (OSFUD) plans. Each field was optimized individually to exhibit a homogeneous target dose for improved robustness of the proton plans. VMAT and OSFUD plans aimed at a median CTV dose of 74 Gy in 37 treatment fractions. CTV V$_{95\%}$ was 100% for all plans, PTV V$_{95\%}$ was above 98% in all cases. If feasible, dose to OARs, in particular the bladder and the rectum, was below the recommendations of the QUANTEC report (Marks Lawrence et al 2010).

Additional single-field uniform dose (SFUD) PBS proton plans were generated at 90° and 270° gantry angle on a 1.0 mm $\times$ 1.0 mm $\times$ 3.0 mm dose grid to investigate proton range accuracy. The finer dose grid in left–right and anterior–posterior direction was chosen along with the resolution of CBCTcor to enable accurate range probing in transverse planes.

For all dose calculations, the same generic CT number to electron density (photons) or relative stopping power (protons) conversion tables were used for CBCTcor and CBCTcycleGAN.

2.5. Data evaluation

CT number accuracy of CBCTorg and CBCTcycleGAN was inferred for the eight test cases by comparison to CBCTcor in terms of the mean (absolute) error (MAE and ME) in HU. Only voxels within the joint body outline of CBCTcor and CBCTcycleGAN/CBCTorg were included. Moreover, the first and the last ten image slices in superior–inferior direction were excluded since they did not exhibit the full lateral FOV after reconstruction. The body outline of the patient, as detected on CBCTorg, CBCTcor and CBCTcycleGAN was compared in the slice of the treatment plan iso-center as function of the gantry angle to identify potential geometric inaccuracies. The gantry angle was sampled in 1° steps and for each angle the distance of the body outline to the iso-center was determined for each CBCT.

Dosimetric accuracy was inferred by recalculating the VMAT and OSFUD plans, optimized on CBCTcor, on CBCTcycleGAN. The dose distributions on both CBCTs were then compared by means of a 1% (only VMAT), 2% and 3% (only OSFUD) dose difference criterion. Only voxels with at least 50% of the prescribed dose were considered. Moreover, for the OSFUD plans, the pass-rates for (2%, 2 mm) and (3%, 3 mm) 3D global gamma-criteria were determined using the same dose cut-off. Additional evaluation with a gamma-criterion for the OSFUD plans was performed to allow for slight proton range differences between both CBCT data-sets. The VMAT and OSFUD dose distributions for CBCTcor and CBCTcycleGAN were also compared in terms of clinically relevant target and OAR DVH parameter. For this, CTV and PTV D$_{98\%}$ , D$_{2\%}$ and V$_{95\%}$ , as well as PTV D$_{50\%}$ were considered. For the rectum, V$_{50/60/65\,\mathrm{Gy}}$ and for the bladder, V$_{60/65\,\mathrm{Gy}}$ were analyzed.

Relative proton range differences between CBCTcor and CBCTcycleGAN were inferred in beam's eye view (BEV) from the SFUD plans by comparing the distance between the patient outline and the 80% iso-dose line on both CBCTs. The absolute position of the 80% iso-dose line (absolute proton range) was also compared.

3. Results

3.1. Network training time

The cycleGAN network was trained for 200 epochs on a Tesla P100 (NVIDIA, California USA) graphical processor unit (GPU). Training took approximately 48 h in total. Once the model was trained, a full 3D CBCTorg with 88 slices could be converted into CBCTcycleGAN within about 10 s.

3.2. Image analysis

The corrected CBCT images obtained from the four trained network models are shown in figure 2 (panels (a)–(d)), together with the calculated median image (panel (e)) and the pixel-wise difference between maximum and minimum HU values (panel (f)) for exemplary patient 7. Deviations between the four different models were most pronounced at the edges of the bony anatomy, as well as at the patient body outline. In the following analysis, only the median image was considered.

Figure 2. Refer to the following caption and surrounding text.

Figure 2. Output of the four trained network models (panels (a)–(d)), median CBCTcycleGAN (panel (e)) and pixel-wise 'maximum minus minimum' (panel (f)) for exemplary test patient 7.

Standard image High-resolution image

Figure 3 illustrates the pCT (panel (d)) as well as the different CBCT data-sets that were used in the scope of this study for patient 7. Panels (a)–(c) show the uncorrected CBCTorg, the reference corrected CBCTcor and the (median) CBCTcycleGAN. CBCTorg is affected by the typical cupping and scatter artifacts and features inaccurate, spatially varying HU values. This is corrected for in CBCTcor, which, however, exhibits an increased noise level due to the low mAs (20 mA $\times$ 20 ms) setting used for CBCT acquisition. In comparison, CBCTcycleGAN has a smoother appearance that is much closer to the diagnostic quality pCT, as expected. The difference images in panels (d) and (e) illustrate that CBCTcycleGAN has a better agreement in terms of HU values to the reference CBCTcor than CBCTorg. In particular, no spatially varying HU deviations and cupping artifacts can be observed. Remaining differences were mostly related to the different noise properties of CBCTcor and CBCTcycleGAN. Similar observations were made for the other patients.

Figure 3. Refer to the following caption and surrounding text.

Figure 3. pCT (d) and CBCT data for exemplary test patient 7: CBCTorg (a), CBCTcor (b) and median CBCTcycleGAN (c). Differences of CBCTorg and CBCTcycleGAN with respect to the reference CBCTcor are illustrated in panels (e) and (f), respectively.

Standard image High-resolution image

The difference maps also indicate slight differences in the patient body outline between CBCTcor and CBCTcycleGAN. For the depicted patient, those can be found mainly close to the treatment couch on the left and on the right side of the patient, as well as at the patient belly, where also the largest differences between the four trained network models were found. The results of the systematic analysis of body outline differences as function of the gantry angle in the iso-center plane are shown in figure 4. Body outline differences were sampled at 1° angular steps and plotted as boxplots for each test case in panel (a). Median deviations of up to 2 mm were found when comparing CBCTcycleGAN to CBCTcor, with CBCTcycleGAN having a larger patient contour in all cases. The 5th/95th percentile indicate deviations of up to 4 mm for certain patients and gantry angles. In comparison, a high agreement of the body outline was found for CBCTcor and CBCTorg, with median deviations below 0.5 mm, thus supporting the accuracy of the CBCTcor body contour. In panel (b), the differences in body outline are plotted as function of the gantry angle, averaged over all patients. Again, high agreement of CBCTorg and CBCTcor was found, while CBCTcycleGAN revealed systematic deviations of up to 4 mm depending on the gantry angle. Deviations were most pronounced at the belly (about 0° to 60° and 300° to 360° gantry angle) and close to the couch on the left and right edge of the patient (gantry angles about 110° and 250°).

Figure 4. Refer to the following caption and surrounding text.

Figure 4. Differences in the patient body outline for CBCTcycleGAN and CBCTorg in comparison to CBCTcor. Panel (a) shows the deviations for each test patient as a boxplot, where each data-point corresponds to a certain gantry angle, sampled in 1° steps. For each angle the difference in distance to the iso-center was recorded. The whiskers range from the 5th–95th percentile. Panel (b) depicts the mean deviation as function of the gantry angle averaged over all patients. The shaded region corresponds to $\pm1\sigma$ . Panel (c) schematically illustrates the position of the gantry angles with respect to the body outline in the transverse iso-center plane.

Standard image High-resolution image

The MAE determined for CBCTcycleGAN and CBCTorg in comparison to CBCTcor are illustrated in the top row of figure 5. Comparable MAE values were found for CBCTorg and CBCTcycleGAN. This might be related to the fact that CBCTcycleGAN is a smoother (pCT equivalent) image in comparison to CBCTorg and CBCTcor which not only show an enhanced, but probably also correlated noise pattern. For CBCTorg, an average (over all test patients) MAE of 103 HU was found, with values ranging from 93 HU to 124 HU. CBCTcycleGAN had a slightly lower average MAE of 87 HU, ranging from 79 HU to 106 HU. In terms of the ME, improved results were obtained for CBCTcycleGAN in comparison to CBCTorg. The average ME decreased from 24 HU to  −6 HU, and the maximum ME shrank from 45 HU to 16 HU.

Figure 5. Refer to the following caption and surrounding text.

Figure 5. MAE (top row) and ME (bottom row) of CBCTorg ((a) and (c)) and CBCTcycleGAN ((b) and (d)) in comparison to CBCTcor for all test patients.

Standard image High-resolution image

3.3. Dosimetric analysis

Figure 6 shows the dose distribution optimized on CBCTcor and recalculated on CBCTcycleGAN for the VMAT, the OSFUD and the SFUD plan at 90° gantry angle for an exemplary patient (left and central column). Dose differences are illustrated in the right column. For VMAT, only minor (below 1%) dose differences between CBCTcycleGAN and CBCTcor were found, with the largest differences close to the patient surface due to slight deviations in the patient body outline. Due to the high sensitivity of the proton range on the CT numbers, deviations were more pronounced for the OSFUD and SFUD plans. Deviations were mostly found at the edge of the high dose region, next to the PTV, where the steepest dose gradients appeared. Still, for the depicted patient and the illustrated 90° SFUD plan, a median relative proton range difference of only 0.1 mm (smaller range for CBCTcycleGAN) was determined.

Figure 6. Refer to the following caption and surrounding text.

Figure 6. Dose distributions of exemplary test patient 4 for the VMAT ((a)–(c)), OSFUD ((d)–(f)) and SFUD (90° gantry angle, (g)–(i)) treatment plans. Dose optimized on CBCTcor ((a), (d) and (g)) and recalculated on CBCTcycleGAN ((b), (e) and (h)) is shown together with the differences ((c), (f) and (i)). For improved visibility, deviations below 0.4% are not displayed in the difference plots.

Standard image High-resolution image

The quantitative results of the dosimetric analysis of the VMAT and OSFUD plans are presented in table 1 for all test cases and all investigated dose difference and gamma criteria. For VMAT, the 2% dose difference pass-rate comparing CBCTcycleGAN to CBCTcor was 99% or higher for all test patients, indicating a high agreement of CBCTcycleGAN with the reference. The 1% DD pass-rate was above 73% for all patients. The lower pass-rates for some patients could be attributed mainly to deviations in the body outline, leading to slightly lower (by less than 1%) median dose values in the target region. As expected, pass-rates were overall lower for the OSFUD plans. For the 2% dose difference criterion, an average pass-rate of 80% was determined. The average 3% DD pass-rate was 86%. In case a gamma-criterion was used, which is slightly less sensitive to range shifts, average pass-rates of 96% (2%, 2 mm) and 100% (3%, 3 mm) were obtained.

Table 1. Dose difference (DD) and gamma pass-rates of all test patients for the VMAT and OSFUD plans. All values in percent.

  VMAT OSFUD
Patient number 1% DD 2% DD 2% DD 3% DD (2%,2 mm) (3%,3 mm)
1  79 100 83 88 95  99
2  99 100 84 90 99 100
3 100 100 71 83 95 100
4  73 100 74 80 93 100
5  99 100 81 87 95  99
6  88 100 77 83 95 100
7  88 100 84 89 96  99
8  84  99 86 91 99 100
Average  89 100 80 86 96 100

The impact of the dosimetric differences on clinically relevant DVH parameters is illustrated in figure 7 for VMAT and in figure 8 for OSFUD plans. For VMAT, differences in target and OAR DVH parameters were mostly below 1 Gy/1%. Only for PTV V$_{95\%}$ , slightly larger deviations were observed. A general trend of smaller values for the investigated DVH parameter on CBCTcycleGAN was observed. This can be attributed to the observed deviations in the body outline, which was generally larger for CBCTcycleGAN and results in a slightly lower dose level in the central region, where CTV, PTV and rectum are located.

Figure 7. Refer to the following caption and surrounding text.

Figure 7. Differences between CBCTcor and CBCTcycleGAN in clinically relevant target (a) and OAR (b) DVH parameters for the VMAT plans. Each test patient is represented by a single data point. Whiskers correspond to the 5th–95th percentile. All dose values correspond to the total dose of the fractionated treatment.

Standard image High-resolution image
Figure 8. Refer to the following caption and surrounding text.

Figure 8. Differences between CBCTcor and CBCTcycleGAN in clinically relevant target (a) and OAR (b) DVH parameters for the OSFUD plans. Each test patient is represented by a single data point. Whiskers correspond to the 5th–95th percentile. All dose values correspond to the total dose of the fractionated treatment.

Standard image High-resolution image

For the OSFUD plans, overall smaller DVH parameter differences between CBCTcor and CBCTcycleGAN were determined. This is likely due to the fact that dosimetric deviations were related to differences in the proton range and appeared mainly left/right of the PTV, in the region of the steep dose gradients in BEV. Thus, DVH parameters related to the coverage of the PTV (in particular D$_{98\%}$ ) show the most pronounced differences, while the DVH parameters for the CTV or for the rectum and the bladder, both located laterally of the beam, are only slightly affected. Differences were below 1 Gy/1% for all investigated parameters and patients.

3.4. Proton range analysis

Results of the proton range analysis on basis of the SFUD plans at 90° and 270° gantry angle are shown in figure 9. Figures 9(a) and (b) show the obtained differences in relative range between CBCTcor and CBCTcycleGAN for all analyzed BEV profiles per test patient as boxplots. The median range differences were below 3 mm for all patients. For most cases the 5th and 95th percentile of the range difference distribution, as indicated by the whiskers, were also within 3 mm. Still, for one case these percentiles extended to more than 5 mm. Figure 9(c) summarizes the results for all BEV profiles of all patients and for both gantry angles. The median range difference of about 19 000 analyzed profiles was only 0.2 mm, with a slightly smaller proton range for CBCTcycleGAN. The 5th and 95th percentile of this cumulative range difference distribution were within $\pm$ 3 mm (dashed lines).

Figure 9. Refer to the following caption and surrounding text.

Figure 9. Differences between CBCTcor and CBCTcycleGAN in terms of the relative proton range for 90° (a) and 270° (b) gantry angles and all test patients. Panel (c) illustrates the cumulative distribution of range differences for both angles and all analyzed BEV profiles as violin plot. The whiskers range from the 5th–95th percentile.

Standard image High-resolution image

When analyzing the absolute position of the 80% distal dose fall-off in BEV (absolute proton range), a median range difference of about 1.1 mm was found, with a shorter absolute range for CBCTcycleGAN. The shift of the absolute range with respect to the relative range by about 1 mm can be attributed to similar differences in the body outline of CBCTcor and CBCTcycleGAN at gantry angles of 90° (outline differences about 0.5 mm) and 270° (outline differences about 1.5 mm, see figure 4(a)).

4. Discussion

In this contribution we have successfully applied a cycleGAN for prostate CBCT correction using unpaired training data. The network was trained to translate a given uncorrected CBCTorg into a pCT equivalent image (CBCTcycleGAN). No previous matching of the data used for training, i.e. CBCTorg and pCT, by means of DIR was performed. For inferring the accuracy of CBCTcycleGAN it was compared to a previously validated CBCT correction method (CBCTcor). With respect to CBCTorg a substantial improvement in HU accuracy was found for CBCTcycleGAN in comparison to CBCTcor.

In terms of dose calculation accuracy, good results were achieved for VMAT when comparing CBCTcycleGAN and CBCTcor: for a 2% dose difference criterion, pass-rates of 99% or higher were determined for all test patients. In Liang et al (2019), a similar cycleGAN architecture was utilized for head and neck CBCT correction in the scope of photon radiotherapy and comparable pass-rates were found for a (2%,2 mm) gamma criterion. Due to the high sensitivity of the proton range on CT numbers, overall smaller pass-rates were obtained for proton OSFUD plans. However, if applying the widely used (3%,3 mm) gamma criterion, which is less sensitive to small range shifts, an average pass-rate close to 100% was found. Proton range analysis also showed that more than 93% of all analyzed BEV dose profiles had a range agreement better than 3 mm. Compared to the total proton range of about 200 mm for prostate patients, this corresponds to a relative range inaccuracy of only 1.5%. In line with this, for most cases a good agreement of CBCTcor and CBCTcycleGAN in terms of clinically relevant DVH parameters was achieved. For OARs, differences were below 1% for all test cases, and also for the target structures deviations were below 1 Gy/1%. For VMAT, a trend of slightly underestimated dose on CBCTcycleGAN was found for centrally located structures, i.e. the PTV, CTV and the rectum. Still, deviations were below 1 Gy/1% in most cases.

The overall lower DVH parameter values for VMAT are likely, besides other imperfections in terms of HU values, attributed to the observed deviations in the patient body outline, which was consistently larger for CBCTcycleGAN. Comparing the output of the four different trained cycleGAN models, it was also found that the region close to the body outline shows the most pronounced variability between the individual models. We deem these inaccuracies are related to the approach of performing unpaired training with input (CBCTorg) and target data (pCT) that features considerable anatomical differences. Although the introduced cycle-consistency loss potentially improves consistency in the mapping between pCT and CBCT space, it does not directly enforce geometric consistency of input and output of the generators. In Hiasa et al (2018) it was recently reported that, in the context of MR-to-CT synthesis for the pelvis, improved geometric consistency of the input and output of the cycleGAN might be achieved by an increased number of training data-sets or the introduction of a gradient-consistency loss. In other works, an additional structure- or shape-consistency loss was suggested to improve data consistency for cycleGAN networks (Yang et al 2018, Cai et al 2019). Future studies may investigate whether these techniques can also lead to improved results for CBCT-to-CT image translation. In this context, the geometric fidelity of the images generated by the cycleGAN might be further analyzed, e.g. in term of the accuracy of important structures, such as the bladder or the prostate. This was deemed beyond the scope of this first proof-of-principle study. However, visual inspection of CBCTcycleGAN did not show remarkable anatomical deviations with respect to CBCTorg beyond the previously described discrepancies in the body outline.

It should be noted that in this work, CBCTcor was employed as reference for validating CBCTcycleGAN. While the accuracy of CBCTcor was shown in several previous studies (Park et al 2015, Kurz et al 2016b), it was also pointed out that CBCTcor can still be slightly affected by remaining geometric inaccuracies in the vCT prior, originating from inaccuracies in the underlying pCT-to-CBCT DIR. Moreover, for few patients we have observed moderate reconstruction artifacts for the used low mAs CBCT protocol close to the patient skin, where increased HU values were found. These artifacts were, however, confined to an extension of only a few voxels, such that the impact, e.g. on the proton range, is estimated to be less than 0.5 mm. Generally, due to the pronounced inter-scan differences between pCT and CBCT, it is deemed difficult to have a more accurate reference image of diagnostic quality with the same anatomy as exhibited by the CBCT. A potential alternative to CBCTcor might be the vCT, which, on the other hand, can similarly be affected by DIR inaccuracies in the pelvic region (Kurz et al 2016b).

Due to potential DIR uncertainties, the feasibility of utilizing unpaired data for training is generally considered an advantage of CBCTcycleGAN. It makes the methodology particularly interesting for situations where it is difficult to achieve accurate matching of the input and target data used for training by means of DIR. For previously investigated U-Net based deep learning approaches for CBCT correction (Hansen et al 2018, Kida et al 2018, Landry et al 2019), generation of the training data explicitly utilized DIR. In Hansen et al (2018), it was used for generating CBCTcor projections which were used for training. In Kida et al (2018) and Landry et al (2019), DIR was used to generate a vCT which was then used for U-Net training. Although it was reported that DIR inaccuracies only have a minor impact on the final model and the obtained corrected CBCT images, each DIR inaccuracy might lead to inconsistencies in the training data, and thus negatively impact training of the network and the retrieved final model, since the loss function is based on a voxel-by-voxel comparison of input and target data. Nevertheless, it should be noted that in Landry et al (2019), based on a similar patient cohort, slightly improved results were reported in terms of dose difference (1% dose difference pass-rate above 98% for VMAT) when training a U-Net to translate CBCTorg to CBCTcor. This might, however, be partially attributed to the fact, that the network was directly trained to generate a CBCTcor equivalent image, and that CBCTcor itself was used as reference for evaluation. A more detailed comparison of the performance of U-Net and cycleGAN based CBCT correction might be subject to future studies.

Besides independence from prior information derived from potentially inaccurate DIR, a major advantage of CBCTcycleGAN with respect to the reference method (CBCTcor) is the enhancement in speed. While, in the current implementation, generation of CBCTcor takes about 6–10 min, generation of a full 3D CBCTcycleGAN image takes only in the order of 10 s. This reduction in correction time is particularly important if considering the intended application of CBCTcycleGAN in the scope of adaptive radiotherapy. Here, it is of utmost importance to perform as fast as possible plan optimization on the basis of the acquired imaging data with the patient in treatment position. Besides improved patient comfort, fast adaptation is particularly important in scenarios where anatomical changes can happen on a time scale of minutes, e.g. in the prostate region.

In this study, CBCT correction was only investigated utilizing prostate cancer patients. However, it is likely that the same trained model can be applied to further diseases in the pelvic region, as has been shown in a study for MR-to-CT translation, which was based on a comparable GAN design (Maspero et al 2018). One potential issue that might arise in this region of the body is truncation of the patient due to the limited lateral CBCT FOV which hinders the correct restoration of the patient outline. In this study, an enlarged lateral FOV by scanning with a shifted detector panel has been used, but still few patients had to be excluded due to severe lateral truncation. Nevertheless, a translation to further treatment sites beyond the pelvis is deemed straightforward since it can be realized by interchanging the training data given to the network. Moreover, from imaging physics perspective, the pelvic region is one of the most challenging regions in the body due to the comparably large patient extension. This typically leads to a higher amount of detected scattered photons, e.g. in comparison to the head and neck region, and typically to more severe artifacts in the original CBCT image. It might also be beneficial for training the network that for other treatment sites, such as head and neck, there are less pronounced inter-scan anatomical variations between input (CBCTorg) and target (pCT) data. Despite the fact that unpaired training had been utilized in this study, improved consistency in the input data might lead to a more accurate representation of the body outline.

5. Conclusion

A cycle-consistent generative adversarial network was successfully trained to perform CBCT-to-CT image translation for CBCT intensity correction using unpaired training data. The obtained CBCTcycleGAN resembles a diagnostic quality planning CT, but features the anatomy of the daily CBCTorg. In terms of dose calculation accuracy, good results were obtained for VMAT plans. Although clinically relevant DVH parameters were accurately predicted for proton OSFUD plans using CBCTcycleGAN, dosimetric analysis showed reduced accuracy with respect to VMAT, as expected. Further improvements in dose calculation accuracy might be achieved by a larger training cohort, or the introduction of a gradient consistency loss to improve accuracy of the patient outline. With respect to the reference technique (CBCTcor), CBCTcycleGAN enables considerably faster image correction and, due to the use of unpaired training data, is not dependent on accurate DIR. Thus, CBCTcycleGAN is considered a promising approach to realize fast image correction in a CBCT based online adaptive radiotherapy treatment workflow.

Acknowledgments

Christopher Kurz received funding from the German Cancer Aid. The work was supported by the German Research Foundation (DFG) Cluster of Excellence Munich Center for Advanced Photonics (MAP) and by the ZonMw IMDI Programme (project number 1040030).

We thank Nadine Spahr, Christoph Brachmann and Florian Weiler for the pCT-to-CBCT DIR used in our study. We thank Erik Traneus and his colleagues from Raysearch Laboratories for support on the research version of used the TPS. We acknowledge the help of Jan Hofmaier on implementing the reference CBCT shading correction method. Moreover, the support of this project by Bas Raaymakers is thankfully acknowledged.

There are no conflicts of interest.

Please wait… references are loading.