CBCT correction using a cycle-consistent generative adversarial network and unpaired training to enable photon and proton dose calculation

Christopher Kurz; Matteo Maspero; Mark H F Savenije; Guillaume Landry; Florian Kamp; Marco Pinto; Minglun Li; Katia Parodi; Claus Belka; Cornelis A T van den Berg

doi:10.1088/1361-6560/ab4d8c

1. Introduction

Modern external beam radiotherapy techniques like intensity-modulated photon radiotherapy (IMRT), volumetric-modulated arc therapy (VMAT) or intensity-modulated proton therapy (IMPT) using pencil-beam scanning (PBS) promise highly accurate dose delivery to the target volume while concurrently sparing adjacent organs-at-risk (OAR). However, in many cases the potential of these techniques cannot be fully exploited due to the presence of inter-fractional anatomical changes occurring between acquisition of the planning imaging data and the actual treatment day. Currently, such anatomical changes are managed by the introduction of safety margins around the actual target volume (van Herk 2004). On the one hand, the use of safety margins ensures homogeneous delivery of the prescribed dose to the tumor in the presence of inter-fractional motion, on the other hand it leads to a substantial increase in the dose burden to neighboring OARs, which eventually limits the dose that is applicable to the tumor without the risk of severe side-effects.

Various approaches have been described in the literature for CBCT intensity correction, aiming at making the acquired pre-treatment imaging data, which is nowadays solely used for accurate patient alignment, suitable for accurate dose calculation and treatment adaptation. This could eventually enable accurate tailoring of the applied dose to the daily anatomy and shrinking of the applied safety margins. CBCT correction techniques range from simple look-up-table or histogram-matching based solutions (Kurz et al 2015, Kidar and Azizi 2018), over the use of planning CT (pCT)-to-CBCT deformable image registration (DIR), yielding a so-called virtual CT (vCT) (Peroni et al 2012, Landry et al 2014, 2015, Veiga et al 2014, 2015, 2016, Wang et al 2016), to the application of Monte-Carlo (MC) based methods for estimating and correcting the detected scatter contribution (Mainegra-Hing and Kawrakow 2010, Thing et al 2016, Zöllner et al 2017). Correction methods have been proposed and investigated in the context of photon (Ding et al 2007, Fotina et al 2012, Niu et al 2012, Veiga et al 2014) as well as in the context of proton radiotherapy (Landry et al 2015, Veiga et al 2015, 2016, Kurz et al 2016a) and for various treatment sites, from head and neck to the lung or the prostate. While most publications report promising results allowing for reasonably accurate CBCT based dose calculation, they also discuss miscellaneous short-comings. For example, while DIR based methods yielded promising results for the head and neck region for photons and protons (Kurz et al 2015, Landry et al 2015), accuracy was limited in the pelvis due to more pronounced anatomical changes from fraction to fraction. In such scenarios, improved results were obtained when using the vCT only as prior to estimate detected low frequency deviations, such as scatter and beam hardening, and to perform a projection based CBCT correction (Niu et al 2010, 2012, Park et al 2015, Kurz et al 2016b). While the latter method demonstrated reduced sensitivity to DIR inaccuracies, generating a corrected CBCT image typically takes in the order of several minutes. In the scope of an adaptive workflow, this would not be acceptable since, e.g. for the prostate, anatomical changes can occur on shorter time scales of only few minutes (Langen et al 2008, McPartlin et al 2016). For the same reason, MC based scatter correction techniques, taking in the order of hours, are not suitable for clinical application in adaptive radiotherapy despite their high accuracy.

A considerable increase in speed for CBCT intensity correction has recently been achieved by the application of deep convolutional neural networks (CNN) (Lecun et al 1998). The favored network structure utilized for CBCT correction in the literature is a U-shaped CNN (U-Net), which features various convolutional and deconvolutional layers in an encoding and decoding branch, as initially suggested by Ronneberger et al (2015). U-Nets have been applied to a variety of image-to-image translation tasks and were successfully applied for CBCT correction in various scenarios: in Kida et al (2018), a U-Net was used for translating the CBCT into a pCT equivalent image by training with CBCT and vCT imaging data as input and target, respectively. Maier et al (2018, 2019) trained a U-Net for fast prediction of the detected scatter by utilizing MC simulated scatter distributions for training. The network prediction of the scatter can then be used for projection based CBCT correction in a second step. Projection based CBCT correction was also performed by Hansen et al (2018) using corrected projections retrieved with a previously validated algorithm based on a vCT prior. The U-Net was then trained to translate measured projections into corrected projections. In a follow-up study, the same U-Net was not only successfully trained for projection based correction, but also for translating the reconstructed uncorrected CBCT image into a vCT equivalent image or into a reference corrected CBCT (Landry et al 2019). In all scenarios, the latter study found high dose calculation accuracy for photon and proton treatment plans. Besides U-Nets, generative adversarial networks (GAN) (Goodfellow et al 2014) have shown promising results for image-to-image translation tasks. Conceptually, GANs consist of a generator and a discriminator network, which are trained jointly with an adversarial loss function. While the generator (typically a U-shaped CNN) aims at generating images realistic enough to fool the discriminator, the discriminator (typically a classification CNN) aims at distinguishing the generated fake images from real (training) images. In the context of medical imaging, GANs have, among others, been applied to generate synthetic CT images from MRI data (Wolterink et al 2017, Maspero et al 2018, Lei et al 2019) and, more recently, to translate CBCTs into pCT-like images of the brain and the pelvis using paired training (Harms et al 2019), as well as of the head and neck region using unpaired training (Liang et al 2019). The latter work also included dosimetric analysis of the GAN based corrected CBCT images, reporting high dose calculation accuracy for photon therapy but not investigating accuracy in the scope of proton therapy.

In this work, we aimed at investigating for the first time the feasibility of utilizing a so-called cycle-consistent GAN (cycleGAN) (Zhu et al 2017) for prostate CBCT correction in photon and proton radiotherapy. The network aimed at translating an uncorrected original CBCT (CBCT_org) into a pCT equivalent image, referred to as CBCT_cycleGAN. The dedicated design of the cycleGAN, in combination with the applied loss function enabled the use of unpaired training data, in this case the daily CBCT and the initial pCT. While previous approaches utilizing U-Nets rely on a precise geometric relation between input and target data, which are compared voxel-by-voxel in the loss functions, this is not the case for the cycleGAN. Thus, it is feasible to use data which features substantial inter-scan differences or was even obtained from different patient cohorts without previous DIR for matching the data. The possibility of using unpaired training data is deemed particularly beneficial in situations, where these data are considerably affected by inter-fractional anatomical changes, e.g. in the prostate region, since any anatomical mismatch between input and target data can affect the accuracy of the trained network model. Accuracy of CBCT_cycleGAN was quantified in terms of Hounsfield unit (HU) mean (absolute) error in comparison to a previously validated CBCT correction strategy (CBCT_cor). Dose calculation accuracy for photon and proton radiotherapy was evaluated by means of various dose difference metrics and clinically relevant dose-volume-histogram (DVH) parameters, comparing CBCT_cycleGAN to CBCT_cor. The proton range on both CBCT data-sets was compared.

2. Materials and methods

2.1. Patient data

pCT and CBCT imaging data of 33 prostate cancer patients originally treated with VMAT to a total dose of 70 Gy–76 Gy in 2 Gy fractions at the Department of Radiation Oncology of the University Hospital of the LMU Munich were included in this study. All pCTs were acquired with a Toshiba Acquilion LB CT scanner (Canon Medical Systems, Japan). Tube voltage was set to 120 kV. An image grid of 1.1 mm $\times$ 1.1 mm $\times$ 3.0 mm was used in combination with a 55 cm lateral field of view (FOV). For each patient, a CBCT image was acquired in treatment position using the XVI system (version 5.0.2) of a Synergy medical linear accelerator (Elekta, Sweden). All CBCTs were acquired using a scan protocol with 120 kV tube voltage and an exposure time of 20 ms at an x-ray tube current of 20 mA per projection. These parameters were chosen in order to avoid saturation of the detector panel and enable accurate determination of the patient outline. A bow-tie filter was used in combination with a shifted detector panel (M position) to enlarge the lateral FOV. Patient data suffering from severe lateral truncation despite the enlarged FOV were not considered in this study. Between 346 and 357 projections were acquired in a 360° scan. The uncorrected CBCT (CBCT_org) was reconstructed using the FDK (Feldkamp–Davis–Kress) implementation of RTK (Reconstruction ToolKit, Rit et al (2014)) on a 1.0 mm $\times$ 1.0 mm $\times$ 1.0 mm grid with 410 $\times$ 410 $\times$ 264 voxels. For training, all CBCT_org were re-binned to a 1.0 mm $\times$ 1.0 mm $\times$ 3.0 mm grid to match the resolution of the pCT in superior–inferior direction. pCT and CBCT_org of each patient were aligned using a rigid translation.

OARs and target structures were delineated by a trained physician on the pCTs. Positioning errors and inter-fractional anatomical changes in the prostate region were accounted for by applying a 7 mm clinical target volume (CTV) to planning target volume (PTV) margin. All contours were transferred to the CBCT via DIR for data analysis.

2.2. cycleGAN network design and training

For CBCT intensity correction, a cycleGAN was investigated in this work. The network architecture was initially proposed and implemented by Zhu et al (2017). The cycleGAN was trained to translate CBCT_org (input) into a pCT equivalent image (output) which, in contrast to CBCT_org, should exhibit accurate quantitative HU values. Training was performed in 2D, i.e. slice-by-slice in the transverse plane. The architecture of the network is illustrated in figure 1. It consists of two GANs that compose a forward and a backward cycle. In the forward cycle, the generator G_pCT starts from a given slice of CBCT_org and tries to generate an image (G_pCT(CBCT_org)) which looks equivalent to a pCT. The discriminator network D_pCT, on the other hand, tries to distinguish the generated synthetic pCT image (G_pCT(CBCT_org)) from the real pCT images. D_pCT outputs the label 0 for synthetic pCT data and 1 for real pCT data. The generator and the discriminator network are trained using an adversarial loss function:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \mathcal{L}_{{\rm pCT}} = \mathbb{E}_{{\rm pCT}}\left[(1-\mathrm{D}_{{\rm pCT}}(\mathrm{pCT}))^2 \right] + \mathbb{E}_{{\rm CBCT}_{{\rm org}}}\left[\mathrm{D}_{{\rm pCT}}(\mathrm{G}_{{\rm pCT}}(\mathrm{CBCT}_{{\rm org}}))^2 \right]. \nonumber \end{align} \tag{ 1 }$

Figure 1. Refer to the following caption and surrounding text. — **Figure 1.** Illustration of the cycleGAN network design. The forward cycle (top) consists of a generator (G_pCT) and discriminator (D_pCT) network, which are trained using a joint adversarial loss function ( $\mathcal{L}_{{\rm pCT}}$ ). In the backward cycle (bottom) pCT and CBCT_org are swapped and G_CBCT and D_CBCT are jointly trained using $\mathcal{L}_{{\rm CBCT}}$ . Cycle consistency is ensured by the L1 loss functions $\mathcal{L}^{\mathrm{fw}}_{{\rm cyc}}$ and $\mathcal{L}^{\mathrm{bw}}_{{\rm cyc}}$ during training. Eventually, the trained G_pCT network was used to translate CBCT_org into a pCT equivalent image (CBCT_cycleGAN).
Download figure:
Standard image High-resolution image

During training, G_pCT aims at maximizing the expectation value (over all pCTs ( $\mathbb{E}_{{\rm pCT}}$ ) and CBCT_org ( $\mathbb{E}_{{\rm CBCT}_{{\rm org}}}$ )) of $\mathcal{L}_{{\rm pCT}}$ by generating as realistic as possible synthetic pCT images from CBCT_org, while D_pCT tries to minimize $\mathcal{L}_{{\rm pCT}}$ by becoming as good as possible in distinguishing real and synthetic pCT image slices. In the backward cycle, pCT and CBCT_org are swapped and the adversarial loss function is:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \mathcal{L}_{{\rm CBCT}} = \mathbb{E}_{{\rm CBCT}_{{\rm org}}}\left[(1-\mathrm{D}_{{\rm CBCT}}(\mathrm{CBCT}_{{\rm org}}))^2\right] + \mathbb{E}_{{\rm pCT}}\left[\mathrm{D}_{{\rm CBCT}}(\mathrm{G}_{{\rm CBCT}}(\mathrm{pCT}))^2\right]. \nonumber \end{align} \tag{ 2 }$

Similar to the forward cycle, G_CBCT tries to maximize $\mathcal{L}_{{\rm CBCT}}$ , while D_CBCT tries to minimize it.

Besides the adversarial loss functions, a cycle consistency loss, as suggested in Zhu et al (2017), is used during training to enforce consistency of the two mappings from pCT to CBCT space and vice-versa ( $\mathrm{G}_{{\rm pCT}}$ and $\mathrm{G}_{{\rm CBCT}}$ ). For this purpose, a loss term $\mathcal{L}_{{\rm cyc}}$ is introduced. In the forward cycle, it compares the output of G_CBCT, with a synthetic pCT generated from CBCT_org as input, to the initial CBCT_org using an L1 norm:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \mathcal{L}^{\mathrm{fw}}_{{\rm cyc}} = \mathbb{E}_{{\rm CBCT}_{{\rm org}}}\left[\| \mathrm{CBCT}_{{\rm org}} - \mathrm{G}_{{\rm CBCT}}(\mathrm{G}_{{\rm pCT}}(\mathrm{CBCT}_{{\rm org}})) \|_1 \right]. \nonumber \end{align} \tag{ 3 }$

In the backward cycle, the roles of CBCT_org and pCT are again swapped and the corresponding cycle consistency loss function is:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \mathcal{L}^{\mathrm{bw}}_{{\rm cyc}} = \mathbb{E}_{{\rm pCT}}\left[\| \mathrm{pCT} - \mathrm{G}_{{\rm pCT}}(\mathrm{G}_{{\rm CBCT}}(\mathrm{pCT})) \|_1\right]. \nonumber \end{align} \tag{ 4 }$

During training, both generator and discriminator networks are optimized in parallel using the combined loss function $\mathcal{L}_{{\rm cycleGAN}}$ :

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \mathcal{L}_{{\rm cycleGAN}} = \mathcal{L}_{{\rm pCT}} + \mathcal{L}_{{\rm CBCT}} + \lambda (\mathcal{L}^{\mathrm{fw}}_{{\rm cyc}} + \mathcal{L}^{\mathrm{bw}}_{{\rm cyc}}), \nonumber \end{align} \tag{ 5 }$

where $\lambda$ is a hyperparameter that was set to 25 in this study. It should be noted that the applied loss function does not rely on a pixel-by-pixel comparison of the respective generator output and a reference training image, which enables training using unpaired data as performed in this study.

Regarding the design of the generator and discriminator networks, we followed the original implementation by Zhu et al (2017), with the only difference that the input was adapted to be a 16-bit grey-scale image. More specifically, for the generators, a nine-blocks residual network (Johnson et al 2016) was employed, consisting of downsampling from 256 $\times$ 256 to 32 $\times$ 32 with a series of three 2D convolutional layers (stride two, kernel size four) followed by linear rectifiers (scalar multiplier of 0.2) and upsampling from 32 $\times$ 32 to 256 $\times$ 256 with a series of three 2D deconvolutional layers (stride two, kernel size four) with linear rectifiers. The residual blocks were composed by a repeated series of padding (reflect mode) and 2D convolutional layers without up or downsampling. For the discriminator, again following the original implementation by Zhu et al, a 70 $\times$ 70 patchGAN was employed with a downsampling scheme from 256 $\times$ 256 to 32 $\times$ 32 by applying four series of leaky rectified linear units (with scalar multiplier of 0.2) followed by 2D convolutional layers. A scalar in the range [0;1] was obtained for each pixel in the final layer of the discriminator (having a receptive field of 70 $\times$ 70 in the input image) and was used for patch-wise classification of the input image as real of fake. Instance normalization (Ulyanov et al 2016) was applied after each 2D convolutional layer. The networks were implemented in Tensorflow (v1.3.0).

For training the network, the registered pCT image slices were resampled to 410 $\times$ 410 pixels to match the resolution of CBCT_org. The treatment table was removed from pCT and CBCT_org, and all pixels outside the body contour (retrieved from a combination of erosion, thresholding, region growing, dilation and filling operations) were set to −1000 HU. All images were resampled to 286 $\times$ 286 pixels, followed by cropping to 256 $\times$ 256 pixels from a randomly chosen starting pixel (between 0 and 29) in left–right and anterior–posterior direction as data augmentation during training. Additionally, images were randomly flipped in left–right direction. The HU range of the images was clipped to [−1000, 2071] and image intensity was linearly rescaled to 16-bit. Stochastic gradient descendent was used applying an Adam optimizer (Kingma and Ba 2014) with momentum parameters $\beta_1 = 0.5$ and $\beta_2 = 0.999$ . The batch size was set to 1. At each optimization step, slices of different patients were randomly selected (unpaired training). All networks were trained from scratch with a learning rate of 0.0002. This learning rate was kept constant for the first 100 epochs, then the rate was linearly decreased to zero over the next 100 epochs, following the approach of Zhu et al (2017).

Training was performed on a subset of 25 patients using four single folds, each containing 18 patients. Once training was finished, only the generator network G_pCT was used for CBCT intensity correction by means of converting a given CBCT_org slice-by-slice into a pCT equivalent image. Since four different folds were used for training the cycleGAN, four different models for G_pCT were obtained and applied to the eight remaining test patients, not seen by the network during training. Then the pixel-wise median of the four model's outputs was calculated for generating the final corrected CBCT image, referred to as CBCT_cycleGAN in the following. All CBCT_cycleGAN were upsampled to the intial CBCT grid of 410 $\times$ 410 pixels with 1 mm $\times$ 1 mm size. In SI direction, the voxel size was 3 mm, similar to the pCT.

2.3. Reference CBCT correction

Since there were substantial anatomical differences between pCT and CBCT_org due to changes in bladder and rectum filling, as well as in patient positioning, the obtained CBCT_cycleGAN was not directly compared to the pCT for inferring the accuracy of CBCT_cycleGAN. Instead, a reference CBCT correction strategy which had been validated in several previous studies (Park et al 2015, Kurz et al 2016b) was utilized and served as reference for evaluating CBCT_cycleGAN for the eight test cases. The reference correction strategy was described in detail in the original publications (Niu et al 2010, 2012), the follow-up studies in a proton specific setting (Park et al 2015, Kurz et al 2016b) and also in the latest publications by our group (Hansen et al 2018, Landry et al 2019). Hence, only the main ideas will be outlined here: the method relies on a vCT prior, obtained from pCT-to-CBCT_org DIR, which is then forward projected according to the geometry of the used CBCT scanner. These forward projections are free of scatter and other low frequency deviations, such as beam hardening. Thus, the contribution of these low frequency deviations in the actually measured CBCT_org projections can be estimated as the difference between the vCT forward projection and the scaled measured projections, plus a generous 2D smoothing filter to account for their low spatial frequency. The estimated low frequency deviations can then be subtracted from the measured projections and the retrieved corrected projections can be reconstructed using the same FDK algorithm and settings as used for reconstructing CBCT_org. The result, in the following referred to as CBCT_cor, is a shading corrected CBCT with HU values equivalent to the pCT, but with the same anatomy as CBCT_cycleGAN. Similar to CBCT_org, CBCT_cor was reconstructed on a 1.0 mm $\times$ 1.0 mm $\times$ 1.0 mm grid with 410 $\times$ 410 $\times$ 264 voxels and then re-binned to a 1.0 mm $\times$ 1.0 mm $\times$ 3.0 mm grid to match the resolution of CBCT_cycleGAN in superior–inferior direction.

2.4. Treatment planning

The CBCT_cor of all eight test patients were imported to a research version of a commercial treatment planning system (TPS, RayStation, version 4.99, RaySearch, Sweden). As mentioned before, contours were transferred from pCT to CBCT_cor via DIR. The retrieved structures were not further adapted since the accuracy of the derived structures was not critical for our study. VMAT plans were generated for each patient on a 3.0 mm $\times$ 3.0 mm $\times$ 3.0 mm dose grid using one full arc and a collapsed-cone dose engine. The generic Elekta Synergy beam model implemented in the TPS was used. Moreover, PBS proton plans were optimized on the same dose grid, using the pencil beam dose engine and the implemented IBA_Dedicated beam model. Beams from the left and from the right of the patient (90° and 270° gantry angle) were applied to generate opposing single-field uniform dose (OSFUD) plans. Each field was optimized individually to exhibit a homogeneous target dose for improved robustness of the proton plans. VMAT and OSFUD plans aimed at a median CTV dose of 74 Gy in 37 treatment fractions. CTV V $_{95\%}$ was 100% for all plans, PTV V $_{95\%}$ was above 98% in all cases. If feasible, dose to OARs, in particular the bladder and the rectum, was below the recommendations of the QUANTEC report (Marks Lawrence et al 2010).

Additional single-field uniform dose (SFUD) PBS proton plans were generated at 90° and 270° gantry angle on a 1.0 mm $\times$ 1.0 mm $\times$ 3.0 mm dose grid to investigate proton range accuracy. The finer dose grid in left–right and anterior–posterior direction was chosen along with the resolution of CBCT_cor to enable accurate range probing in transverse planes.

For all dose calculations, the same generic CT number to electron density (photons) or relative stopping power (protons) conversion tables were used for CBCT_cor and CBCT_cycleGAN.

2.5. Data evaluation

CT number accuracy of CBCT_org and CBCT_cycleGAN was inferred for the eight test cases by comparison to CBCT_cor in terms of the mean (absolute) error (MAE and ME) in HU. Only voxels within the joint body outline of CBCT_cor and CBCT_cycleGAN/CBCT_org were included. Moreover, the first and the last ten image slices in superior–inferior direction were excluded since they did not exhibit the full lateral FOV after reconstruction. The body outline of the patient, as detected on CBCT_org, CBCT_cor and CBCT_cycleGAN was compared in the slice of the treatment plan iso-center as function of the gantry angle to identify potential geometric inaccuracies. The gantry angle was sampled in 1° steps and for each angle the distance of the body outline to the iso-center was determined for each CBCT.

Dosimetric accuracy was inferred by recalculating the VMAT and OSFUD plans, optimized on CBCT_cor, on CBCT_cycleGAN. The dose distributions on both CBCTs were then compared by means of a 1% (only VMAT), 2% and 3% (only OSFUD) dose difference criterion. Only voxels with at least 50% of the prescribed dose were considered. Moreover, for the OSFUD plans, the pass-rates for (2%, 2 mm) and (3%, 3 mm) 3D global gamma-criteria were determined using the same dose cut-off. Additional evaluation with a gamma-criterion for the OSFUD plans was performed to allow for slight proton range differences between both CBCT data-sets. The VMAT and OSFUD dose distributions for CBCT_cor and CBCT_cycleGAN were also compared in terms of clinically relevant target and OAR DVH parameter. For this, CTV and PTV D $_{98\%}$ , D $_{2\%}$ and V $_{95\%}$ , as well as PTV D $_{50\%}$ were considered. For the rectum, V $_{50/60/65\,\mathrm{Gy}}$ and for the bladder, V $_{60/65\,\mathrm{Gy}}$ were analyzed.

Relative proton range differences between CBCT_cor and CBCT_cycleGAN were inferred in beam's eye view (BEV) from the SFUD plans by comparing the distance between the patient outline and the 80% iso-dose line on both CBCTs. The absolute position of the 80% iso-dose line (absolute proton range) was also compared.

3. Results

3.1. Network training time

The cycleGAN network was trained for 200 epochs on a Tesla P100 (NVIDIA, California USA) graphical processor unit (GPU). Training took approximately 48 h in total. Once the model was trained, a full 3D CBCT_org with 88 slices could be converted into CBCT_cycleGAN within about 10 s.

3.2. Image analysis

The corrected CBCT images obtained from the four trained network models are shown in figure 2 (panels (a)–(d)), together with the calculated median image (panel (e)) and the pixel-wise difference between maximum and minimum HU values (panel (f)) for exemplary patient 7. Deviations between the four different models were most pronounced at the edges of the bony anatomy, as well as at the patient body outline. In the following analysis, only the median image was considered.

Figure 2. Refer to the following caption and surrounding text. — **Figure 2.** Output of the four trained network models (panels (a)–(d)), median CBCT_cycleGAN (panel (e)) and pixel-wise 'maximum minus minimum' (panel (f)) for exemplary test patient 7.
Download figure:
Standard image High-resolution image

Figure 3 illustrates the pCT (panel (d)) as well as the different CBCT data-sets that were used in the scope of this study for patient 7. Panels (a)–(c) show the uncorrected CBCT_org, the reference corrected CBCT_cor and the (median) CBCT_cycleGAN. CBCT_org is affected by the typical cupping and scatter artifacts and features inaccurate, spatially varying HU values. This is corrected for in CBCT_cor, which, however, exhibits an increased noise level due to the low mAs (20 mA $\times$ 20 ms) setting used for CBCT acquisition. In comparison, CBCT_cycleGAN has a smoother appearance that is much closer to the diagnostic quality pCT, as expected. The difference images in panels (d) and (e) illustrate that CBCT_cycleGAN has a better agreement in terms of HU values to the reference CBCT_cor than CBCT_org. In particular, no spatially varying HU deviations and cupping artifacts can be observed. Remaining differences were mostly related to the different noise properties of CBCT_cor and CBCT_cycleGAN. Similar observations were made for the other patients.

Figure 3. Refer to the following caption and surrounding text. — **Figure 3.** pCT (d) and CBCT data for exemplary test patient 7: CBCT_org (a), CBCT_cor (b) and median CBCT_cycleGAN (c). Differences of CBCT_org and CBCT_cycleGAN with respect to the reference CBCT_cor are illustrated in panels (e) and (f), respectively.
Download figure:
Standard image High-resolution image

The difference maps also indicate slight differences in the patient body outline between CBCT_cor and CBCT_cycleGAN. For the depicted patient, those can be found mainly close to the treatment couch on the left and on the right side of the patient, as well as at the patient belly, where also the largest differences between the four trained network models were found. The results of the systematic analysis of body outline differences as function of the gantry angle in the iso-center plane are shown in figure 4. Body outline differences were sampled at 1° angular steps and plotted as boxplots for each test case in panel (a). Median deviations of up to 2 mm were found when comparing CBCT_cycleGAN to CBCT_cor, with CBCT_cycleGAN having a larger patient contour in all cases. The 5th/95th percentile indicate deviations of up to 4 mm for certain patients and gantry angles. In comparison, a high agreement of the body outline was found for CBCT_cor and CBCT_org, with median deviations below 0.5 mm, thus supporting the accuracy of the CBCT_cor body contour. In panel (b), the differences in body outline are plotted as function of the gantry angle, averaged over all patients. Again, high agreement of CBCT_org and CBCT_cor was found, while CBCT_cycleGAN revealed systematic deviations of up to 4 mm depending on the gantry angle. Deviations were most pronounced at the belly (about 0° to 60° and 300° to 360° gantry angle) and close to the couch on the left and right edge of the patient (gantry angles about 110° and 250°).

Figure 4. Refer to the following caption and surrounding text. — **Figure 4.** Differences in the patient body outline for CBCT_cycleGAN and CBCT_org in comparison to CBCT_cor. Panel (a) shows the deviations for each test patient as a boxplot, where each data-point corresponds to a certain gantry angle, sampled in 1° steps. For each angle the difference in distance to the iso-center was recorded. The whiskers range from the 5th–95th percentile. Panel (b) depicts the mean deviation as function of the gantry angle averaged over all patients. The shaded region corresponds to $\pm1\sigma$ . Panel (c) schematically illustrates the position of the gantry angles with respect to the body outline in the transverse iso-center plane.
Download figure:
Standard image High-resolution image

**Figure 4.** Differences in the patient body outline for CBCT_cycleGAN and CBCT_org in comparison to CBCT_cor. Panel (a) shows the deviations for each test patient as a boxplot, where each data-point corresponds to a certain gantry angle, sampled in 1° steps. For each angle the difference in distance to the iso-center was recorded. The whiskers range from the 5th–95th percentile. Panel (b) depicts the mean deviation as function of the gantry angle averaged over all patients. The shaded region corresponds to $\pm1\sigma$ . Panel (c) schematically illustrates the position of the gantry angles with respect to the body outline in the transverse iso-center plane.
Download figure:
Standard image High-resolution image

The MAE determined for CBCT_cycleGAN and CBCT_org in comparison to CBCT_cor are illustrated in the top row of figure 5. Comparable MAE values were found for CBCT_org and CBCT_cycleGAN. This might be related to the fact that CBCT_cycleGAN is a smoother (pCT equivalent) image in comparison to CBCT_org and CBCT_cor which not only show an enhanced, but probably also correlated noise pattern. For CBCT_org, an average (over all test patients) MAE of 103 HU was found, with values ranging from 93 HU to 124 HU. CBCT_cycleGAN had a slightly lower average MAE of 87 HU, ranging from 79 HU to 106 HU. In terms of the ME, improved results were obtained for CBCT_cycleGAN in comparison to CBCT_org. The average ME decreased from 24 HU to −6 HU, and the maximum ME shrank from 45 HU to 16 HU.

Figure 5. Refer to the following caption and surrounding text. — **Figure 5.** MAE (top row) and ME (bottom row) of CBCT_org ((a) and (c)) and CBCT_cycleGAN ((b) and (d)) in comparison to CBCT_cor for all test patients.
Download figure:
Standard image High-resolution image

3.3. Dosimetric analysis

Figure 6 shows the dose distribution optimized on CBCT_cor and recalculated on CBCT_cycleGAN for the VMAT, the OSFUD and the SFUD plan at 90° gantry angle for an exemplary patient (left and central column). Dose differences are illustrated in the right column. For VMAT, only minor (below 1%) dose differences between CBCT_cycleGAN and CBCT_cor were found, with the largest differences close to the patient surface due to slight deviations in the patient body outline. Due to the high sensitivity of the proton range on the CT numbers, deviations were more pronounced for the OSFUD and SFUD plans. Deviations were mostly found at the edge of the high dose region, next to the PTV, where the steepest dose gradients appeared. Still, for the depicted patient and the illustrated 90° SFUD plan, a median relative proton range difference of only 0.1 mm (smaller range for CBCT_cycleGAN) was determined.

Figure 6. Refer to the following caption and surrounding text. — **Figure 6.** Dose distributions of exemplary test patient 4 for the VMAT ((a)–(c)), OSFUD ((d)–(f)) and SFUD (90° gantry angle, (g)–(i)) treatment plans. Dose optimized on CBCT_cor ((a), (d) and (g)) and recalculated on CBCT_cycleGAN ((b), (e) and (h)) is shown together with the differences ((c), (f) and (i)). For improved visibility, deviations below 0.4% are not displayed in the difference plots.
Download figure:
Standard image High-resolution image

The quantitative results of the dosimetric analysis of the VMAT and OSFUD plans are presented in table 1 for all test cases and all investigated dose difference and gamma criteria. For VMAT, the 2% dose difference pass-rate comparing CBCT_cycleGAN to CBCT_cor was 99% or higher for all test patients, indicating a high agreement of CBCT_cycleGAN with the reference. The 1% DD pass-rate was above 73% for all patients. The lower pass-rates for some patients could be attributed mainly to deviations in the body outline, leading to slightly lower (by less than 1%) median dose values in the target region. As expected, pass-rates were overall lower for the OSFUD plans. For the 2% dose difference criterion, an average pass-rate of 80% was determined. The average 3% DD pass-rate was 86%. In case a gamma-criterion was used, which is slightly less sensitive to range shifts, average pass-rates of 96% (2%, 2 mm) and 100% (3%, 3 mm) were obtained.

Table 1. Dose difference (DD) and gamma pass-rates of all test patients for the VMAT and OSFUD plans. All values in percent.

	VMAT		OSFUD
Patient number	1% DD	2% DD	2% DD	3% DD	(2%,2 mm)	(3%,3 mm)
1	79	100	83	88	95	99
2	99	100	84	90	99	100
3	100	100	71	83	95	100
4	73	100	74	80	93	100
5	99	100	81	87	95	99
6	88	100	77	83	95	100
7	88	100	84	89	96	99
8	84	99	86	91	99	100

Average	89	100	80	86	96	100

The impact of the dosimetric differences on clinically relevant DVH parameters is illustrated in figure 7 for VMAT and in figure 8 for OSFUD plans. For VMAT, differences in target and OAR DVH parameters were mostly below 1 Gy/1%. Only for PTV V $_{95\%}$ , slightly larger deviations were observed. A general trend of smaller values for the investigated DVH parameter on CBCT_cycleGAN was observed. This can be attributed to the observed deviations in the body outline, which was generally larger for CBCT_cycleGAN and results in a slightly lower dose level in the central region, where CTV, PTV and rectum are located.

Figure 7. Refer to the following caption and surrounding text. — **Figure 7.** Differences between CBCT_cor and CBCT_cycleGAN in clinically relevant target (a) and OAR (b) DVH parameters for the VMAT plans. Each test patient is represented by a single data point. Whiskers correspond to the 5th–95th percentile. All dose values correspond to the total dose of the fractionated treatment.
Download figure:
Standard image High-resolution image

Figure 8. Refer to the following caption and surrounding text. — **Figure 8.** Differences between CBCT_cor and CBCT_cycleGAN in clinically relevant target (a) and OAR (b) DVH parameters for the OSFUD plans. Each test patient is represented by a single data point. Whiskers correspond to the 5th–95th percentile. All dose values correspond to the total dose of the fractionated treatment.
Download figure:
Standard image High-resolution image

For the OSFUD plans, overall smaller DVH parameter differences between CBCT_cor and CBCT_cycleGAN were determined. This is likely due to the fact that dosimetric deviations were related to differences in the proton range and appeared mainly left/right of the PTV, in the region of the steep dose gradients in BEV. Thus, DVH parameters related to the coverage of the PTV (in particular D $_{98\%}$ ) show the most pronounced differences, while the DVH parameters for the CTV or for the rectum and the bladder, both located laterally of the beam, are only slightly affected. Differences were below 1 Gy/1% for all investigated parameters and patients.

3.4. Proton range analysis

Results of the proton range analysis on basis of the SFUD plans at 90° and 270° gantry angle are shown in figure 9. Figures 9(a) and (b) show the obtained differences in relative range between CBCT_cor and CBCT_cycleGAN for all analyzed BEV profiles per test patient as boxplots. The median range differences were below 3 mm for all patients. For most cases the 5th and 95th percentile of the range difference distribution, as indicated by the whiskers, were also within 3 mm. Still, for one case these percentiles extended to more than 5 mm. Figure 9(c) summarizes the results for all BEV profiles of all patients and for both gantry angles. The median range difference of about 19 000 analyzed profiles was only 0.2 mm, with a slightly smaller proton range for CBCT_cycleGAN. The 5th and 95th percentile of this cumulative range difference distribution were within $\pm$ 3 mm (dashed lines).

Figure 9. Refer to the following caption and surrounding text. — **Figure 9.** Differences between CBCT_cor and CBCT_cycleGAN in terms of the relative proton range for 90° (a) and 270° (b) gantry angles and all test patients. Panel (c) illustrates the cumulative distribution of range differences for both angles and all analyzed BEV profiles as violin plot. The whiskers range from the 5th–95th percentile.
Download figure:
Standard image High-resolution image

When analyzing the absolute position of the 80% distal dose fall-off in BEV (absolute proton range), a median range difference of about 1.1 mm was found, with a shorter absolute range for CBCT_cycleGAN. The shift of the absolute range with respect to the relative range by about 1 mm can be attributed to similar differences in the body outline of CBCT_cor and CBCT_cycleGAN at gantry angles of 90° (outline differences about 0.5 mm) and 270° (outline differences about 1.5 mm, see figure 4(a)).

4. Discussion

In this contribution we have successfully applied a cycleGAN for prostate CBCT correction using unpaired training data. The network was trained to translate a given uncorrected CBCT_org into a pCT equivalent image (CBCT_cycleGAN). No previous matching of the data used for training, i.e. CBCT_org and pCT, by means of DIR was performed. For inferring the accuracy of CBCT_cycleGAN it was compared to a previously validated CBCT correction method (CBCT_cor). With respect to CBCT_org a substantial improvement in HU accuracy was found for CBCT_cycleGAN in comparison to CBCT_cor.

In terms of dose calculation accuracy, good results were achieved for VMAT when comparing CBCT_cycleGAN and CBCT_cor: for a 2% dose difference criterion, pass-rates of 99% or higher were determined for all test patients. In Liang et al (2019), a similar cycleGAN architecture was utilized for head and neck CBCT correction in the scope of photon radiotherapy and comparable pass-rates were found for a (2%,2 mm) gamma criterion. Due to the high sensitivity of the proton range on CT numbers, overall smaller pass-rates were obtained for proton OSFUD plans. However, if applying the widely used (3%,3 mm) gamma criterion, which is less sensitive to small range shifts, an average pass-rate close to 100% was found. Proton range analysis also showed that more than 93% of all analyzed BEV dose profiles had a range agreement better than 3 mm. Compared to the total proton range of about 200 mm for prostate patients, this corresponds to a relative range inaccuracy of only 1.5%. In line with this, for most cases a good agreement of CBCT_cor and CBCT_cycleGAN in terms of clinically relevant DVH parameters was achieved. For OARs, differences were below 1% for all test cases, and also for the target structures deviations were below 1 Gy/1%. For VMAT, a trend of slightly underestimated dose on CBCT_cycleGAN was found for centrally located structures, i.e. the PTV, CTV and the rectum. Still, deviations were below 1 Gy/1% in most cases.

The overall lower DVH parameter values for VMAT are likely, besides other imperfections in terms of HU values, attributed to the observed deviations in the patient body outline, which was consistently larger for CBCT_cycleGAN. Comparing the output of the four different trained cycleGAN models, it was also found that the region close to the body outline shows the most pronounced variability between the individual models. We deem these inaccuracies are related to the approach of performing unpaired training with input (CBCT_org) and target data (pCT) that features considerable anatomical differences. Although the introduced cycle-consistency loss potentially improves consistency in the mapping between pCT and CBCT space, it does not directly enforce geometric consistency of input and output of the generators. In Hiasa et al (2018) it was recently reported that, in the context of MR-to-CT synthesis for the pelvis, improved geometric consistency of the input and output of the cycleGAN might be achieved by an increased number of training data-sets or the introduction of a gradient-consistency loss. In other works, an additional structure- or shape-consistency loss was suggested to improve data consistency for cycleGAN networks (Yang et al 2018, Cai et al 2019). Future studies may investigate whether these techniques can also lead to improved results for CBCT-to-CT image translation. In this context, the geometric fidelity of the images generated by the cycleGAN might be further analyzed, e.g. in term of the accuracy of important structures, such as the bladder or the prostate. This was deemed beyond the scope of this first proof-of-principle study. However, visual inspection of CBCT_cycleGAN did not show remarkable anatomical deviations with respect to CBCT_org beyond the previously described discrepancies in the body outline.

It should be noted that in this work, CBCT_cor was employed as reference for validating CBCT_cycleGAN. While the accuracy of CBCT_cor was shown in several previous studies (Park et al 2015, Kurz et al 2016b), it was also pointed out that CBCT_cor can still be slightly affected by remaining geometric inaccuracies in the vCT prior, originating from inaccuracies in the underlying pCT-to-CBCT DIR. Moreover, for few patients we have observed moderate reconstruction artifacts for the used low mAs CBCT protocol close to the patient skin, where increased HU values were found. These artifacts were, however, confined to an extension of only a few voxels, such that the impact, e.g. on the proton range, is estimated to be less than 0.5 mm. Generally, due to the pronounced inter-scan differences between pCT and CBCT, it is deemed difficult to have a more accurate reference image of diagnostic quality with the same anatomy as exhibited by the CBCT. A potential alternative to CBCT_cor might be the vCT, which, on the other hand, can similarly be affected by DIR inaccuracies in the pelvic region (Kurz et al 2016b).

Due to potential DIR uncertainties, the feasibility of utilizing unpaired data for training is generally considered an advantage of CBCT_cycleGAN. It makes the methodology particularly interesting for situations where it is difficult to achieve accurate matching of the input and target data used for training by means of DIR. For previously investigated U-Net based deep learning approaches for CBCT correction (Hansen et al 2018, Kida et al 2018, Landry et al 2019), generation of the training data explicitly utilized DIR. In Hansen et al (2018), it was used for generating CBCT_cor projections which were used for training. In Kida et al (2018) and Landry et al (2019), DIR was used to generate a vCT which was then used for U-Net training. Although it was reported that DIR inaccuracies only have a minor impact on the final model and the obtained corrected CBCT images, each DIR inaccuracy might lead to inconsistencies in the training data, and thus negatively impact training of the network and the retrieved final model, since the loss function is based on a voxel-by-voxel comparison of input and target data. Nevertheless, it should be noted that in Landry et al (2019), based on a similar patient cohort, slightly improved results were reported in terms of dose difference (1% dose difference pass-rate above 98% for VMAT) when training a U-Net to translate CBCT_org to CBCT_cor. This might, however, be partially attributed to the fact, that the network was directly trained to generate a CBCT_cor equivalent image, and that CBCT_cor itself was used as reference for evaluation. A more detailed comparison of the performance of U-Net and cycleGAN based CBCT correction might be subject to future studies.

Besides independence from prior information derived from potentially inaccurate DIR, a major advantage of CBCT_cycleGAN with respect to the reference method (CBCT_cor) is the enhancement in speed. While, in the current implementation, generation of CBCT_cor takes about 6–10 min, generation of a full 3D CBCT_cycleGAN image takes only in the order of 10 s. This reduction in correction time is particularly important if considering the intended application of CBCT_cycleGAN in the scope of adaptive radiotherapy. Here, it is of utmost importance to perform as fast as possible plan optimization on the basis of the acquired imaging data with the patient in treatment position. Besides improved patient comfort, fast adaptation is particularly important in scenarios where anatomical changes can happen on a time scale of minutes, e.g. in the prostate region.

In this study, CBCT correction was only investigated utilizing prostate cancer patients. However, it is likely that the same trained model can be applied to further diseases in the pelvic region, as has been shown in a study for MR-to-CT translation, which was based on a comparable GAN design (Maspero et al 2018). One potential issue that might arise in this region of the body is truncation of the patient due to the limited lateral CBCT FOV which hinders the correct restoration of the patient outline. In this study, an enlarged lateral FOV by scanning with a shifted detector panel has been used, but still few patients had to be excluded due to severe lateral truncation. Nevertheless, a translation to further treatment sites beyond the pelvis is deemed straightforward since it can be realized by interchanging the training data given to the network. Moreover, from imaging physics perspective, the pelvic region is one of the most challenging regions in the body due to the comparably large patient extension. This typically leads to a higher amount of detected scattered photons, e.g. in comparison to the head and neck region, and typically to more severe artifacts in the original CBCT image. It might also be beneficial for training the network that for other treatment sites, such as head and neck, there are less pronounced inter-scan anatomical variations between input (CBCT_org) and target (pCT) data. Despite the fact that unpaired training had been utilized in this study, improved consistency in the input data might lead to a more accurate representation of the body outline.

5. Conclusion

A cycle-consistent generative adversarial network was successfully trained to perform CBCT-to-CT image translation for CBCT intensity correction using unpaired training data. The obtained CBCT_cycleGAN resembles a diagnostic quality planning CT, but features the anatomy of the daily CBCT_org. In terms of dose calculation accuracy, good results were obtained for VMAT plans. Although clinically relevant DVH parameters were accurately predicted for proton OSFUD plans using CBCT_cycleGAN, dosimetric analysis showed reduced accuracy with respect to VMAT, as expected. Further improvements in dose calculation accuracy might be achieved by a larger training cohort, or the introduction of a gradient consistency loss to improve accuracy of the patient outline. With respect to the reference technique (CBCT_cor), CBCT_cycleGAN enables considerably faster image correction and, due to the use of unpaired training data, is not dependent on accurate DIR. Thus, CBCT_cycleGAN is considered a promising approach to realize fast image correction in a CBCT based online adaptive radiotherapy treatment workflow.

Acknowledgments

Christopher Kurz received funding from the German Cancer Aid. The work was supported by the German Research Foundation (DFG) Cluster of Excellence Munich Center for Advanced Photonics (MAP) and by the ZonMw IMDI Programme (project number 1040030).

We thank Nadine Spahr, Christoph Brachmann and Florian Weiler for the pCT-to-CBCT DIR used in our study. We thank Erik Traneus and his colleagues from Raysearch Laboratories for support on the research version of used the TPS. We acknowledge the help of Jan Hofmaier on implementing the reference CBCT shading correction method. Moreover, the support of this project by Bas Raaymakers is thankfully acknowledged.

There are no conflicts of interest.

Author e-mails

Author affiliations

ORCID iDs

Dates