Abstract
In presence of inter-fractional anatomical changes, clinical benefits are anticipated from image-guided adaptive radiotherapy. Nowadays, cone-beam CT (CBCT) imaging is mostly utilized during pre-treatment imaging for position verification. Due to various artifacts, image quality is typically not sufficient for photon or proton dose calculation, thus demanding accurate CBCT correction, as potentially provided by deep learning techniques.
This work aimed at investigating the feasibility of utilizing a cycle-consistent generative adversarial network (cycleGAN) for prostate CBCT correction using unpaired training. Thirty-three patients were included. The network was trained to translate uncorrected, original CBCT images (CBCTorg) into planning CT equivalent images (CBCTcycleGAN). HU accuracy was determined by comparison to a previously validated CBCT correction technique (CBCTcor). Dosimetric accuracy was inferred for volumetric-modulated arc photon therapy (VMAT) and opposing single-field uniform dose (OSFUD) proton plans, optimized on CBCTcor and recalculated on CBCTcycleGAN. Single-sided SFUD proton plans were utilized to assess proton range accuracy.
The mean HU error of CBCTcycleGAN with respect to CBCTcor decreased from 24 HU for CBCTorg to −6 HU. Dose calculation accuracy was high for VMAT, with average pass-rates of 100%/89% for a 2%/1% dose difference criterion. For proton OSFUD plans, the average pass-rate for a 2% dose difference criterion was 80%. Using a (2%, 2 mm) gamma criterion, the pass-rate was 96%. 93% of all analyzed SFUD profiles had a range agreement better than 3 mm. CBCT correction time was reduced from 6–10 min for CBCTcor to 10 s for CBCTcycleGAN.
Our study demonstrated the feasibility of utilizing a cycleGAN for CBCT correction, achieving high dose calculation accuracy for VMAT. For proton therapy, further improvements may be required. Due to unpaired training, the approach does not rely on anatomically consistent training data or potentially inaccurate deformable image registration. The substantial speed-up for CBCT correction renders the method particularly interesting for adaptive radiotherapy.
Export citation and abstract BibTeX RIS
1. Introduction
Modern external beam radiotherapy techniques like intensity-modulated photon radiotherapy (IMRT), volumetric-modulated arc therapy (VMAT) or intensity-modulated proton therapy (IMPT) using pencil-beam scanning (PBS) promise highly accurate dose delivery to the target volume while concurrently sparing adjacent organs-at-risk (OAR). However, in many cases the potential of these techniques cannot be fully exploited due to the presence of inter-fractional anatomical changes occurring between acquisition of the planning imaging data and the actual treatment day. Currently, such anatomical changes are managed by the introduction of safety margins around the actual target volume (van Herk 2004). On the one hand, the use of safety margins ensures homogeneous delivery of the prescribed dose to the tumor in the presence of inter-fractional motion, on the other hand it leads to a substantial increase in the dose burden to neighboring OARs, which eventually limits the dose that is applicable to the tumor without the risk of severe side-effects.
Various approaches have been described in the literature for CBCT intensity correction, aiming at making the acquired pre-treatment imaging data, which is nowadays solely used for accurate patient alignment, suitable for accurate dose calculation and treatment adaptation. This could eventually enable accurate tailoring of the applied dose to the daily anatomy and shrinking of the applied safety margins. CBCT correction techniques range from simple look-up-table or histogram-matching based solutions (Kurz et al 2015, Kidar and Azizi 2018), over the use of planning CT (pCT)-to-CBCT deformable image registration (DIR), yielding a so-called virtual CT (vCT) (Peroni et al 2012, Landry et al 2014, 2015, Veiga et al 2014, 2015, 2016, Wang et al 2016), to the application of Monte-Carlo (MC) based methods for estimating and correcting the detected scatter contribution (Mainegra-Hing and Kawrakow 2010, Thing et al 2016, Zöllner et al 2017). Correction methods have been proposed and investigated in the context of photon (Ding et al 2007, Fotina et al 2012, Niu et al 2012, Veiga et al 2014) as well as in the context of proton radiotherapy (Landry et al 2015, Veiga et al 2015, 2016, Kurz et al 2016a) and for various treatment sites, from head and neck to the lung or the prostate. While most publications report promising results allowing for reasonably accurate CBCT based dose calculation, they also discuss miscellaneous short-comings. For example, while DIR based methods yielded promising results for the head and neck region for photons and protons (Kurz et al 2015, Landry et al 2015), accuracy was limited in the pelvis due to more pronounced anatomical changes from fraction to fraction. In such scenarios, improved results were obtained when using the vCT only as prior to estimate detected low frequency deviations, such as scatter and beam hardening, and to perform a projection based CBCT correction (Niu et al 2010, 2012, Park et al 2015, Kurz et al 2016b). While the latter method demonstrated reduced sensitivity to DIR inaccuracies, generating a corrected CBCT image typically takes in the order of several minutes. In the scope of an adaptive workflow, this would not be acceptable since, e.g. for the prostate, anatomical changes can occur on shorter time scales of only few minutes (Langen et al 2008, McPartlin et al 2016). For the same reason, MC based scatter correction techniques, taking in the order of hours, are not suitable for clinical application in adaptive radiotherapy despite their high accuracy.
A considerable increase in speed for CBCT intensity correction has recently been achieved by the application of deep convolutional neural networks (CNN) (Lecun et al 1998). The favored network structure utilized for CBCT correction in the literature is a U-shaped CNN (U-Net), which features various convolutional and deconvolutional layers in an encoding and decoding branch, as initially suggested by Ronneberger et al (2015). U-Nets have been applied to a variety of image-to-image translation tasks and were successfully applied for CBCT correction in various scenarios: in Kida et al (2018), a U-Net was used for translating the CBCT into a pCT equivalent image by training with CBCT and vCT imaging data as input and target, respectively. Maier et al (2018, 2019) trained a U-Net for fast prediction of the detected scatter by utilizing MC simulated scatter distributions for training. The network prediction of the scatter can then be used for projection based CBCT correction in a second step. Projection based CBCT correction was also performed by Hansen et al (2018) using corrected projections retrieved with a previously validated algorithm based on a vCT prior. The U-Net was then trained to translate measured projections into corrected projections. In a follow-up study, the same U-Net was not only successfully trained for projection based correction, but also for translating the reconstructed uncorrected CBCT image into a vCT equivalent image or into a reference corrected CBCT (Landry et al 2019). In all scenarios, the latter study found high dose calculation accuracy for photon and proton treatment plans. Besides U-Nets, generative adversarial networks (GAN) (Goodfellow et al 2014) have shown promising results for image-to-image translation tasks. Conceptually, GANs consist of a generator and a discriminator network, which are trained jointly with an adversarial loss function. While the generator (typically a U-shaped CNN) aims at generating images realistic enough to fool the discriminator, the discriminator (typically a classification CNN) aims at distinguishing the generated fake images from real (training) images. In the context of medical imaging, GANs have, among others, been applied to generate synthetic CT images from MRI data (Wolterink et al 2017, Maspero et al 2018, Lei et al 2019) and, more recently, to translate CBCTs into pCT-like images of the brain and the pelvis using paired training (Harms et al 2019), as well as of the head and neck region using unpaired training (Liang et al 2019). The latter work also included dosimetric analysis of the GAN based corrected CBCT images, reporting high dose calculation accuracy for photon therapy but not investigating accuracy in the scope of proton therapy.
In this work, we aimed at investigating for the first time the feasibility of utilizing a so-called cycle-consistent GAN (cycleGAN) (Zhu et al 2017) for prostate CBCT correction in photon and proton radiotherapy. The network aimed at translating an uncorrected original CBCT (CBCTorg) into a pCT equivalent image, referred to as CBCTcycleGAN. The dedicated design of the cycleGAN, in combination with the applied loss function enabled the use of unpaired training data, in this case the daily CBCT and the initial pCT. While previous approaches utilizing U-Nets rely on a precise geometric relation between input and target data, which are compared voxel-by-voxel in the loss functions, this is not the case for the cycleGAN. Thus, it is feasible to use data which features substantial inter-scan differences or was even obtained from different patient cohorts without previous DIR for matching the data. The possibility of using unpaired training data is deemed particularly beneficial in situations, where these data are considerably affected by inter-fractional anatomical changes, e.g. in the prostate region, since any anatomical mismatch between input and target data can affect the accuracy of the trained network model. Accuracy of CBCTcycleGAN was quantified in terms of Hounsfield unit (HU) mean (absolute) error in comparison to a previously validated CBCT correction strategy (CBCTcor). Dose calculation accuracy for photon and proton radiotherapy was evaluated by means of various dose difference metrics and clinically relevant dose-volume-histogram (DVH) parameters, comparing CBCTcycleGAN to CBCTcor. The proton range on both CBCT data-sets was compared.
2. Materials and methods
2.1. Patient data
pCT and CBCT imaging data of 33 prostate cancer patients originally treated with VMAT to a total dose of 70 Gy–76 Gy in 2 Gy fractions at the Department of Radiation Oncology of the University Hospital of the LMU Munich were included in this study. All pCTs were acquired with a Toshiba Acquilion LB CT scanner (Canon Medical Systems, Japan). Tube voltage was set to 120 kV. An image grid of 1.1 mm 1.1 mm 3.0 mm was used in combination with a 55 cm lateral field of view (FOV). For each patient, a CBCT image was acquired in treatment position using the XVI system (version 5.0.2) of a Synergy medical linear accelerator (Elekta, Sweden). All CBCTs were acquired using a scan protocol with 120 kV tube voltage and an exposure time of 20 ms at an x-ray tube current of 20 mA per projection. These parameters were chosen in order to avoid saturation of the detector panel and enable accurate determination of the patient outline. A bow-tie filter was used in combination with a shifted detector panel (M position) to enlarge the lateral FOV. Patient data suffering from severe lateral truncation despite the enlarged FOV were not considered in this study. Between 346 and 357 projections were acquired in a 360° scan. The uncorrected CBCT (CBCTorg) was reconstructed using the FDK (Feldkamp–Davis–Kress) implementation of RTK (Reconstruction ToolKit, Rit et al (2014)) on a 1.0 mm 1.0 mm 1.0 mm grid with 410 410 264 voxels. For training, all CBCTorg were re-binned to a 1.0 mm 1.0 mm 3.0 mm grid to match the resolution of the pCT in superior–inferior direction. pCT and CBCTorg of each patient were aligned using a rigid translation.
OARs and target structures were delineated by a trained physician on the pCTs. Positioning errors and inter-fractional anatomical changes in the prostate region were accounted for by applying a 7 mm clinical target volume (CTV) to planning target volume (PTV) margin. All contours were transferred to the CBCT via DIR for data analysis.
2.2. cycleGAN network design and training
For CBCT intensity correction, a cycleGAN was investigated in this work. The network architecture was initially proposed and implemented by Zhu et al (2017). The cycleGAN was trained to translate CBCTorg (input) into a pCT equivalent image (output) which, in contrast to CBCTorg, should exhibit accurate quantitative HU values. Training was performed in 2D, i.e. slice-by-slice in the transverse plane. The architecture of the network is illustrated in figure 1. It consists of two GANs that compose a forward and a backward cycle. In the forward cycle, the generator GpCT starts from a given slice of CBCTorg and tries to generate an image (GpCT(CBCTorg)) which looks equivalent to a pCT. The discriminator network DpCT, on the other hand, tries to distinguish the generated synthetic pCT image (GpCT(CBCTorg)) from the real pCT images. DpCT outputs the label 0 for synthetic pCT data and 1 for real pCT data. The generator and the discriminator network are trained using an adversarial loss function:
During training, GpCT aims at maximizing the expectation value (over all pCTs () and CBCTorg ()) of by generating as realistic as possible synthetic pCT images from CBCTorg, while DpCT tries to minimize by becoming as good as possible in distinguishing real and synthetic pCT image slices. In the backward cycle, pCT and CBCTorg are swapped and the adversarial loss function is:
Similar to the forward cycle, GCBCT tries to maximize , while DCBCT tries to minimize it.
Besides the adversarial loss functions, a cycle consistency loss, as suggested in Zhu et al (2017), is used during training to enforce consistency of the two mappings from pCT to CBCT space and vice-versa ( and ). For this purpose, a loss term is introduced. In the forward cycle, it compares the output of GCBCT, with a synthetic pCT generated from CBCTorg as input, to the initial CBCTorg using an L1 norm:
In the backward cycle, the roles of CBCTorg and pCT are again swapped and the corresponding cycle consistency loss function is:
During training, both generator and discriminator networks are optimized in parallel using the combined loss function :
where is a hyperparameter that was set to 25 in this study. It should be noted that the applied loss function does not rely on a pixel-by-pixel comparison of the respective generator output and a reference training image, which enables training using unpaired data as performed in this study.
Regarding the design of the generator and discriminator networks, we followed the original implementation by Zhu et al (2017), with the only difference that the input was adapted to be a 16-bit grey-scale image. More specifically, for the generators, a nine-blocks residual network (Johnson et al 2016) was employed, consisting of downsampling from 256 256 to 32 32 with a series of three 2D convolutional layers (stride two, kernel size four) followed by linear rectifiers (scalar multiplier of 0.2) and upsampling from 32 32 to 256 256 with a series of three 2D deconvolutional layers (stride two, kernel size four) with linear rectifiers. The residual blocks were composed by a repeated series of padding (reflect mode) and 2D convolutional layers without up or downsampling. For the discriminator, again following the original implementation by Zhu et al, a 70 70 patchGAN was employed with a downsampling scheme from 256 256 to 32 32 by applying four series of leaky rectified linear units (with scalar multiplier of 0.2) followed by 2D convolutional layers. A scalar in the range [0;1] was obtained for each pixel in the final layer of the discriminator (having a receptive field of 70 70 in the input image) and was used for patch-wise classification of the input image as real of fake. Instance normalization (Ulyanov et al 2016) was applied after each 2D convolutional layer. The networks were implemented in Tensorflow (v1.3.0).
For training the network, the registered pCT image slices were resampled to 410 410 pixels to match the resolution of CBCTorg. The treatment table was removed from pCT and CBCTorg, and all pixels outside the body contour (retrieved from a combination of erosion, thresholding, region growing, dilation and filling operations) were set to −1000 HU. All images were resampled to 286 286 pixels, followed by cropping to 256 256 pixels from a randomly chosen starting pixel (between 0 and 29) in left–right and anterior–posterior direction as data augmentation during training. Additionally, images were randomly flipped in left–right direction. The HU range of the images was clipped to [−1000, 2071] and image intensity was linearly rescaled to 16-bit. Stochastic gradient descendent was used applying an Adam optimizer (Kingma and Ba 2014) with momentum parameters and . The batch size was set to 1. At each optimization step, slices of different patients were randomly selected (unpaired training). All networks were trained from scratch with a learning rate of 0.0002. This learning rate was kept constant for the first 100 epochs, then the rate was linearly decreased to zero over the next 100 epochs, following the approach of Zhu et al (2017).
Training was performed on a subset of 25 patients using four single folds, each containing 18 patients. Once training was finished, only the generator network GpCT was used for CBCT intensity correction by means of converting a given CBCTorg slice-by-slice into a pCT equivalent image. Since four different folds were used for training the cycleGAN, four different models for GpCT were obtained and applied to the eight remaining test patients, not seen by the network during training. Then the pixel-wise median of the four model's outputs was calculated for generating the final corrected CBCT image, referred to as CBCTcycleGAN in the following. All CBCTcycleGAN were upsampled to the intial CBCT grid of 410 410 pixels with 1 mm 1 mm size. In SI direction, the voxel size was 3 mm, similar to the pCT.
2.3. Reference CBCT correction
Since there were substantial anatomical differences between pCT and CBCTorg due to changes in bladder and rectum filling, as well as in patient positioning, the obtained CBCTcycleGAN was not directly compared to the pCT for inferring the accuracy of CBCTcycleGAN. Instead, a reference CBCT correction strategy which had been validated in several previous studies (Park et al 2015, Kurz et al 2016b) was utilized and served as reference for evaluating CBCTcycleGAN for the eight test cases. The reference correction strategy was described in detail in the original publications (Niu et al 2010, 2012), the follow-up studies in a proton specific setting (Park et al 2015, Kurz et al 2016b) and also in the latest publications by our group (Hansen et al 2018, Landry et al 2019). Hence, only the main ideas will be outlined here: the method relies on a vCT prior, obtained from pCT-to-CBCTorg DIR, which is then forward projected according to the geometry of the used CBCT scanner. These forward projections are free of scatter and other low frequency deviations, such as beam hardening. Thus, the contribution of these low frequency deviations in the actually measured CBCTorg projections can be estimated as the difference between the vCT forward projection and the scaled measured projections, plus a generous 2D smoothing filter to account for their low spatial frequency. The estimated low frequency deviations can then be subtracted from the measured projections and the retrieved corrected projections can be reconstructed using the same FDK algorithm and settings as used for reconstructing CBCTorg. The result, in the following referred to as CBCTcor, is a shading corrected CBCT with HU values equivalent to the pCT, but with the same anatomy as CBCTcycleGAN. Similar to CBCTorg, CBCTcor was reconstructed on a 1.0 mm 1.0 mm 1.0 mm grid with 410 410 264 voxels and then re-binned to a 1.0 mm 1.0 mm 3.0 mm grid to match the resolution of CBCTcycleGAN in superior–inferior direction.
2.4. Treatment planning
The CBCTcor of all eight test patients were imported to a research version of a commercial treatment planning system (TPS, RayStation, version 4.99, RaySearch, Sweden). As mentioned before, contours were transferred from pCT to CBCTcor via DIR. The retrieved structures were not further adapted since the accuracy of the derived structures was not critical for our study. VMAT plans were generated for each patient on a 3.0 mm 3.0 mm 3.0 mm dose grid using one full arc and a collapsed-cone dose engine. The generic Elekta Synergy beam model implemented in the TPS was used. Moreover, PBS proton plans were optimized on the same dose grid, using the pencil beam dose engine and the implemented IBA_Dedicated beam model. Beams from the left and from the right of the patient (90° and 270° gantry angle) were applied to generate opposing single-field uniform dose (OSFUD) plans. Each field was optimized individually to exhibit a homogeneous target dose for improved robustness of the proton plans. VMAT and OSFUD plans aimed at a median CTV dose of 74 Gy in 37 treatment fractions. CTV V was 100% for all plans, PTV V was above 98% in all cases. If feasible, dose to OARs, in particular the bladder and the rectum, was below the recommendations of the QUANTEC report (Marks Lawrence et al 2010).
Additional single-field uniform dose (SFUD) PBS proton plans were generated at 90° and 270° gantry angle on a 1.0 mm 1.0 mm 3.0 mm dose grid to investigate proton range accuracy. The finer dose grid in left–right and anterior–posterior direction was chosen along with the resolution of CBCTcor to enable accurate range probing in transverse planes.
For all dose calculations, the same generic CT number to electron density (photons) or relative stopping power (protons) conversion tables were used for CBCTcor and CBCTcycleGAN.
2.5. Data evaluation
CT number accuracy of CBCTorg and CBCTcycleGAN was inferred for the eight test cases by comparison to CBCTcor in terms of the mean (absolute) error (MAE and ME) in HU. Only voxels within the joint body outline of CBCTcor and CBCTcycleGAN/CBCTorg were included. Moreover, the first and the last ten image slices in superior–inferior direction were excluded since they did not exhibit the full lateral FOV after reconstruction. The body outline of the patient, as detected on CBCTorg, CBCTcor and CBCTcycleGAN was compared in the slice of the treatment plan iso-center as function of the gantry angle to identify potential geometric inaccuracies. The gantry angle was sampled in 1° steps and for each angle the distance of the body outline to the iso-center was determined for each CBCT.
Dosimetric accuracy was inferred by recalculating the VMAT and OSFUD plans, optimized on CBCTcor, on CBCTcycleGAN. The dose distributions on both CBCTs were then compared by means of a 1% (only VMAT), 2% and 3% (only OSFUD) dose difference criterion. Only voxels with at least 50% of the prescribed dose were considered. Moreover, for the OSFUD plans, the pass-rates for (2%, 2 mm) and (3%, 3 mm) 3D global gamma-criteria were determined using the same dose cut-off. Additional evaluation with a gamma-criterion for the OSFUD plans was performed to allow for slight proton range differences between both CBCT data-sets. The VMAT and OSFUD dose distributions for CBCTcor and CBCTcycleGAN were also compared in terms of clinically relevant target and OAR DVH parameter. For this, CTV and PTV D, D and V, as well as PTV D were considered. For the rectum, V and for the bladder, V were analyzed.
Relative proton range differences between CBCTcor and CBCTcycleGAN were inferred in beam's eye view (BEV) from the SFUD plans by comparing the distance between the patient outline and the 80% iso-dose line on both CBCTs. The absolute position of the 80% iso-dose line (absolute proton range) was also compared.
3. Results
3.1. Network training time
The cycleGAN network was trained for 200 epochs on a Tesla P100 (NVIDIA, California USA) graphical processor unit (GPU). Training took approximately 48 h in total. Once the model was trained, a full 3D CBCTorg with 88 slices could be converted into CBCTcycleGAN within about 10 s.
3.2. Image analysis
The corrected CBCT images obtained from the four trained network models are shown in figure 2 (panels (a)–(d)), together with the calculated median image (panel (e)) and the pixel-wise difference between maximum and minimum HU values (panel (f)) for exemplary patient 7. Deviations between the four different models were most pronounced at the edges of the bony anatomy, as well as at the patient body outline. In the following analysis, only the median image was considered.
Download figure:
Standard image High-resolution imageFigure 3 illustrates the pCT (panel (d)) as well as the different CBCT data-sets that were used in the scope of this study for patient 7. Panels (a)–(c) show the uncorrected CBCTorg, the reference corrected CBCTcor and the (median) CBCTcycleGAN. CBCTorg is affected by the typical cupping and scatter artifacts and features inaccurate, spatially varying HU values. This is corrected for in CBCTcor, which, however, exhibits an increased noise level due to the low mAs (20 mA 20 ms) setting used for CBCT acquisition. In comparison, CBCTcycleGAN has a smoother appearance that is much closer to the diagnostic quality pCT, as expected. The difference images in panels (d) and (e) illustrate that CBCTcycleGAN has a better agreement in terms of HU values to the reference CBCTcor than CBCTorg. In particular, no spatially varying HU deviations and cupping artifacts can be observed. Remaining differences were mostly related to the different noise properties of CBCTcor and CBCTcycleGAN. Similar observations were made for the other patients.
Download figure:
Standard image High-resolution imageThe difference maps also indicate slight differences in the patient body outline between CBCTcor and CBCTcycleGAN. For the depicted patient, those can be found mainly close to the treatment couch on the left and on the right side of the patient, as well as at the patient belly, where also the largest differences between the four trained network models were found. The results of the systematic analysis of body outline differences as function of the gantry angle in the iso-center plane are shown in figure 4. Body outline differences were sampled at 1° angular steps and plotted as boxplots for each test case in panel (a). Median deviations of up to 2 mm were found when comparing CBCTcycleGAN to CBCTcor, with CBCTcycleGAN having a larger patient contour in all cases. The 5th/95th percentile indicate deviations of up to 4 mm for certain patients and gantry angles. In comparison, a high agreement of the body outline was found for CBCTcor and CBCTorg, with median deviations below 0.5 mm, thus supporting the accuracy of the CBCTcor body contour. In panel (b), the differences in body outline are plotted as function of the gantry angle, averaged over all patients. Again, high agreement of CBCTorg and CBCTcor was found, while CBCTcycleGAN revealed systematic deviations of up to 4 mm depending on the gantry angle. Deviations were most pronounced at the belly (about 0° to 60° and 300° to 360° gantry angle) and close to the couch on the left and right edge of the patient (gantry angles about 110° and 250°).
Download figure:
Standard image High-resolution imageThe MAE determined for CBCTcycleGAN and CBCTorg in comparison to CBCTcor are illustrated in the top row of figure 5. Comparable MAE values were found for CBCTorg and CBCTcycleGAN. This might be related to the fact that CBCTcycleGAN is a smoother (pCT equivalent) image in comparison to CBCTorg and CBCTcor which not only show an enhanced, but probably also correlated noise pattern. For CBCTorg, an average (over all test patients) MAE of 103 HU was found, with values ranging from 93 HU to 124 HU. CBCTcycleGAN had a slightly lower average MAE of 87 HU, ranging from 79 HU to 106 HU. In terms of the ME, improved results were obtained for CBCTcycleGAN in comparison to CBCTorg. The average ME decreased from 24 HU to −6 HU, and the maximum ME shrank from 45 HU to 16 HU.
Download figure:
Standard image High-resolution image3.3. Dosimetric analysis
Figure 6 shows the dose distribution optimized on CBCTcor and recalculated on CBCTcycleGAN for the VMAT, the OSFUD and the SFUD plan at 90° gantry angle for an exemplary patient (left and central column). Dose differences are illustrated in the right column. For VMAT, only minor (below 1%) dose differences between CBCTcycleGAN and CBCTcor were found, with the largest differences close to the patient surface due to slight deviations in the patient body outline. Due to the high sensitivity of the proton range on the CT numbers, deviations were more pronounced for the OSFUD and SFUD plans. Deviations were mostly found at the edge of the high dose region, next to the PTV, where the steepest dose gradients appeared. Still, for the depicted patient and the illustrated 90° SFUD plan, a median relative proton range difference of only 0.1 mm (smaller range for CBCTcycleGAN) was determined.
Download figure:
Standard image High-resolution imageThe quantitative results of the dosimetric analysis of the VMAT and OSFUD plans are presented in table 1 for all test cases and all investigated dose difference and gamma criteria. For VMAT, the 2% dose difference pass-rate comparing CBCTcycleGAN to CBCTcor was 99% or higher for all test patients, indicating a high agreement of CBCTcycleGAN with the reference. The 1% DD pass-rate was above 73% for all patients. The lower pass-rates for some patients could be attributed mainly to deviations in the body outline, leading to slightly lower (by less than 1%) median dose values in the target region. As expected, pass-rates were overall lower for the OSFUD plans. For the 2% dose difference criterion, an average pass-rate of 80% was determined. The average 3% DD pass-rate was 86%. In case a gamma-criterion was used, which is slightly less sensitive to range shifts, average pass-rates of 96% (2%, 2 mm) and 100% (3%, 3 mm) were obtained.
Table 1. Dose difference (DD) and gamma pass-rates of all test patients for the VMAT and OSFUD plans. All values in percent.
VMAT | OSFUD | |||||
---|---|---|---|---|---|---|
Patient number | 1% DD | 2% DD | 2% DD | 3% DD | (2%,2 mm) | (3%,3 mm) |
1 | 79 | 100 | 83 | 88 | 95 | 99 |
2 | 99 | 100 | 84 | 90 | 99 | 100 |
3 | 100 | 100 | 71 | 83 | 95 | 100 |
4 | 73 | 100 | 74 | 80 | 93 | 100 |
5 | 99 | 100 | 81 | 87 | 95 | 99 |
6 | 88 | 100 | 77 | 83 | 95 | 100 |
7 | 88 | 100 | 84 | 89 | 96 | 99 |
8 | 84 | 99 | 86 | 91 | 99 | 100 |
Average | 89 | 100 | 80 | 86 | 96 | 100 |
The impact of the dosimetric differences on clinically relevant DVH parameters is illustrated in figure 7 for VMAT and in figure 8 for OSFUD plans. For VMAT, differences in target and OAR DVH parameters were mostly below 1 Gy/1%. Only for PTV V, slightly larger deviations were observed. A general trend of smaller values for the investigated DVH parameter on CBCTcycleGAN was observed. This can be attributed to the observed deviations in the body outline, which was generally larger for CBCTcycleGAN and results in a slightly lower dose level in the central region, where CTV, PTV and rectum are located.
Download figure:
Standard image High-resolution imageDownload figure:
Standard image High-resolution imageFor the OSFUD plans, overall smaller DVH parameter differences between CBCTcor and CBCTcycleGAN were determined. This is likely due to the fact that dosimetric deviations were related to differences in the proton range and appeared mainly left/right of the PTV, in the region of the steep dose gradients in BEV. Thus, DVH parameters related to the coverage of the PTV (in particular D) show the most pronounced differences, while the DVH parameters for the CTV or for the rectum and the bladder, both located laterally of the beam, are only slightly affected. Differences were below 1 Gy/1% for all investigated parameters and patients.
3.4. Proton range analysis
Results of the proton range analysis on basis of the SFUD plans at 90° and 270° gantry angle are shown in figure 9. Figures 9(a) and (b) show the obtained differences in relative range between CBCTcor and CBCTcycleGAN for all analyzed BEV profiles per test patient as boxplots. The median range differences were below 3 mm for all patients. For most cases the 5th and 95th percentile of the range difference distribution, as indicated by the whiskers, were also within 3 mm. Still, for one case these percentiles extended to more than 5 mm. Figure 9(c) summarizes the results for all BEV profiles of all patients and for both gantry angles. The median range difference of about 19 000 analyzed profiles was only 0.2 mm, with a slightly smaller proton range for CBCTcycleGAN. The 5th and 95th percentile of this cumulative range difference distribution were within 3 mm (dashed lines).
Download figure:
Standard image High-resolution imageWhen analyzing the absolute position of the 80% distal dose fall-off in BEV (absolute proton range), a median range difference of about 1.1 mm was found, with a shorter absolute range for CBCTcycleGAN. The shift of the absolute range with respect to the relative range by about 1 mm can be attributed to similar differences in the body outline of CBCTcor and CBCTcycleGAN at gantry angles of 90° (outline differences about 0.5 mm) and 270° (outline differences about 1.5 mm, see figure 4(a)).
4. Discussion
In this contribution we have successfully applied a cycleGAN for prostate CBCT correction using unpaired training data. The network was trained to translate a given uncorrected CBCTorg into a pCT equivalent image (CBCTcycleGAN). No previous matching of the data used for training, i.e. CBCTorg and pCT, by means of DIR was performed. For inferring the accuracy of CBCTcycleGAN it was compared to a previously validated CBCT correction method (CBCTcor). With respect to CBCTorg a substantial improvement in HU accuracy was found for CBCTcycleGAN in comparison to CBCTcor.
In terms of dose calculation accuracy, good results were achieved for VMAT when comparing CBCTcycleGAN and CBCTcor: for a 2% dose difference criterion, pass-rates of 99% or higher were determined for all test patients. In Liang et al (2019), a similar cycleGAN architecture was utilized for head and neck CBCT correction in the scope of photon radiotherapy and comparable pass-rates were found for a (2%,2 mm) gamma criterion. Due to the high sensitivity of the proton range on CT numbers, overall smaller pass-rates were obtained for proton OSFUD plans. However, if applying the widely used (3%,3 mm) gamma criterion, which is less sensitive to small range shifts, an average pass-rate close to 100% was found. Proton range analysis also showed that more than 93% of all analyzed BEV dose profiles had a range agreement better than 3 mm. Compared to the total proton range of about 200 mm for prostate patients, this corresponds to a relative range inaccuracy of only 1.5%. In line with this, for most cases a good agreement of CBCTcor and CBCTcycleGAN in terms of clinically relevant DVH parameters was achieved. For OARs, differences were below 1% for all test cases, and also for the target structures deviations were below 1 Gy/1%. For VMAT, a trend of slightly underestimated dose on CBCTcycleGAN was found for centrally located structures, i.e. the PTV, CTV and the rectum. Still, deviations were below 1 Gy/1% in most cases.
The overall lower DVH parameter values for VMAT are likely, besides other imperfections in terms of HU values, attributed to the observed deviations in the patient body outline, which was consistently larger for CBCTcycleGAN. Comparing the output of the four different trained cycleGAN models, it was also found that the region close to the body outline shows the most pronounced variability between the individual models. We deem these inaccuracies are related to the approach of performing unpaired training with input (CBCTorg) and target data (pCT) that features considerable anatomical differences. Although the introduced cycle-consistency loss potentially improves consistency in the mapping between pCT and CBCT space, it does not directly enforce geometric consistency of input and output of the generators. In Hiasa et al (2018) it was recently reported that, in the context of MR-to-CT synthesis for the pelvis, improved geometric consistency of the input and output of the cycleGAN might be achieved by an increased number of training data-sets or the introduction of a gradient-consistency loss. In other works, an additional structure- or shape-consistency loss was suggested to improve data consistency for cycleGAN networks (Yang et al 2018, Cai et al 2019). Future studies may investigate whether these techniques can also lead to improved results for CBCT-to-CT image translation. In this context, the geometric fidelity of the images generated by the cycleGAN might be further analyzed, e.g. in term of the accuracy of important structures, such as the bladder or the prostate. This was deemed beyond the scope of this first proof-of-principle study. However, visual inspection of CBCTcycleGAN did not show remarkable anatomical deviations with respect to CBCTorg beyond the previously described discrepancies in the body outline.
It should be noted that in this work, CBCTcor was employed as reference for validating CBCTcycleGAN. While the accuracy of CBCTcor was shown in several previous studies (Park et al 2015, Kurz et al 2016b), it was also pointed out that CBCTcor can still be slightly affected by remaining geometric inaccuracies in the vCT prior, originating from inaccuracies in the underlying pCT-to-CBCT DIR. Moreover, for few patients we have observed moderate reconstruction artifacts for the used low mAs CBCT protocol close to the patient skin, where increased HU values were found. These artifacts were, however, confined to an extension of only a few voxels, such that the impact, e.g. on the proton range, is estimated to be less than 0.5 mm. Generally, due to the pronounced inter-scan differences between pCT and CBCT, it is deemed difficult to have a more accurate reference image of diagnostic quality with the same anatomy as exhibited by the CBCT. A potential alternative to CBCTcor might be the vCT, which, on the other hand, can similarly be affected by DIR inaccuracies in the pelvic region (Kurz et al 2016b).
Due to potential DIR uncertainties, the feasibility of utilizing unpaired data for training is generally considered an advantage of CBCTcycleGAN. It makes the methodology particularly interesting for situations where it is difficult to achieve accurate matching of the input and target data used for training by means of DIR. For previously investigated U-Net based deep learning approaches for CBCT correction (Hansen et al 2018, Kida et al 2018, Landry et al 2019), generation of the training data explicitly utilized DIR. In Hansen et al (2018), it was used for generating CBCTcor projections which were used for training. In Kida et al (2018) and Landry et al (2019), DIR was used to generate a vCT which was then used for U-Net training. Although it was reported that DIR inaccuracies only have a minor impact on the final model and the obtained corrected CBCT images, each DIR inaccuracy might lead to inconsistencies in the training data, and thus negatively impact training of the network and the retrieved final model, since the loss function is based on a voxel-by-voxel comparison of input and target data. Nevertheless, it should be noted that in Landry et al (2019), based on a similar patient cohort, slightly improved results were reported in terms of dose difference (1% dose difference pass-rate above 98% for VMAT) when training a U-Net to translate CBCTorg to CBCTcor. This might, however, be partially attributed to the fact, that the network was directly trained to generate a CBCTcor equivalent image, and that CBCTcor itself was used as reference for evaluation. A more detailed comparison of the performance of U-Net and cycleGAN based CBCT correction might be subject to future studies.
Besides independence from prior information derived from potentially inaccurate DIR, a major advantage of CBCTcycleGAN with respect to the reference method (CBCTcor) is the enhancement in speed. While, in the current implementation, generation of CBCTcor takes about 6–10 min, generation of a full 3D CBCTcycleGAN image takes only in the order of 10 s. This reduction in correction time is particularly important if considering the intended application of CBCTcycleGAN in the scope of adaptive radiotherapy. Here, it is of utmost importance to perform as fast as possible plan optimization on the basis of the acquired imaging data with the patient in treatment position. Besides improved patient comfort, fast adaptation is particularly important in scenarios where anatomical changes can happen on a time scale of minutes, e.g. in the prostate region.
In this study, CBCT correction was only investigated utilizing prostate cancer patients. However, it is likely that the same trained model can be applied to further diseases in the pelvic region, as has been shown in a study for MR-to-CT translation, which was based on a comparable GAN design (Maspero et al 2018). One potential issue that might arise in this region of the body is truncation of the patient due to the limited lateral CBCT FOV which hinders the correct restoration of the patient outline. In this study, an enlarged lateral FOV by scanning with a shifted detector panel has been used, but still few patients had to be excluded due to severe lateral truncation. Nevertheless, a translation to further treatment sites beyond the pelvis is deemed straightforward since it can be realized by interchanging the training data given to the network. Moreover, from imaging physics perspective, the pelvic region is one of the most challenging regions in the body due to the comparably large patient extension. This typically leads to a higher amount of detected scattered photons, e.g. in comparison to the head and neck region, and typically to more severe artifacts in the original CBCT image. It might also be beneficial for training the network that for other treatment sites, such as head and neck, there are less pronounced inter-scan anatomical variations between input (CBCTorg) and target (pCT) data. Despite the fact that unpaired training had been utilized in this study, improved consistency in the input data might lead to a more accurate representation of the body outline.
5. Conclusion
A cycle-consistent generative adversarial network was successfully trained to perform CBCT-to-CT image translation for CBCT intensity correction using unpaired training data. The obtained CBCTcycleGAN resembles a diagnostic quality planning CT, but features the anatomy of the daily CBCTorg. In terms of dose calculation accuracy, good results were obtained for VMAT plans. Although clinically relevant DVH parameters were accurately predicted for proton OSFUD plans using CBCTcycleGAN, dosimetric analysis showed reduced accuracy with respect to VMAT, as expected. Further improvements in dose calculation accuracy might be achieved by a larger training cohort, or the introduction of a gradient consistency loss to improve accuracy of the patient outline. With respect to the reference technique (CBCTcor), CBCTcycleGAN enables considerably faster image correction and, due to the use of unpaired training data, is not dependent on accurate DIR. Thus, CBCTcycleGAN is considered a promising approach to realize fast image correction in a CBCT based online adaptive radiotherapy treatment workflow.
Acknowledgments
Christopher Kurz received funding from the German Cancer Aid. The work was supported by the German Research Foundation (DFG) Cluster of Excellence Munich Center for Advanced Photonics (MAP) and by the ZonMw IMDI Programme (project number 1040030).
We thank Nadine Spahr, Christoph Brachmann and Florian Weiler for the pCT-to-CBCT DIR used in our study. We thank Erik Traneus and his colleagues from Raysearch Laboratories for support on the research version of used the TPS. We acknowledge the help of Jan Hofmaier on implementing the reference CBCT shading correction method. Moreover, the support of this project by Bas Raaymakers is thankfully acknowledged.
There are no conflicts of interest.