Deep Learning Methods for Image Guidance in Radiation Therapy

Paysan, Pascal; Peterlik, Igor; Roggen, Toon; Zhu, Liangjia; Wessels, Claas; Schreier, Jan; Buchacek, Martin; Scheib, Stefan

doi:10.1007/978-3-030-58309-5_1

Pascal Paysan ORCID: orcid.org/0000-0001-9461-9450¹⁰,
Igor Peterlik¹⁰,
Toon Roggen ORCID: orcid.org/0000-0002-0027-8235¹⁰,
Liangjia Zhu¹⁰,
Claas Wessels¹⁰,
Jan Schreier¹⁰,
Martin Buchacek¹⁰ &
…
Stefan Scheib¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12294))

Included in the following conference series:

IAPR Workshop on Artificial Neural Networks in Pattern Recognition

1318 Accesses
3 Citations

Abstract

Image guidance became one of the most important key technologies in radiation therapy in the last two decades. Nowadays medical images play a key role in virtually every aspect of the common treatment workflows. Advances in imaging hardware and algorithmic processing are enablers for substantial treatment quality improvements like online adaptation of the treatment, accounting for anatomical changes of the day, or intra-fraction motion monitoring and organ position verification during treatment. Going through this rapid development, an important observation is that further improvements of various methods heavily rely on model knowledge. In a classical sense such model knowledge is for example provided by mathematically formulated physical assumptions to ill-posed problems or by expert systems and heuristics. Recently, it became evident that in various applications such classical approaches get outperformed by data driven machine learning methods. Especially worth to mention is that this not only holds true in terms of precision and computational performance but also in terms of complexity reduction and maintainability. In this paper we provide an overview about the different stages in the X-ray based imaging pipeline in radiation therapy where machine learning based algorithms show promising results or are already applied in clinical routine.

You have full access to this open access chapter, Download conference paper PDF

Keywords

1 Introduction

A typical radiation therapy treatment starts with a diagnosis based on Computed Tomography (CT), Magnetic Resonance Imaging (MRI), Single-Photon Emission Computed Tomography (SPECT), Positron Emission Tomography - CT (PET/CT), or a combination of medical imaging modalities [48]. A first step in radiation therapy is typically the acquisition of a so-called simulation CT scan, where the patient is positioned in its treatment position. The simulation CT serves as a quantitative patient model which is used in the treatment planning process. Typical tasks are anatomy delineation, optimization of the radiation beam entrance angles and dose calculation. This acquired CT scan serves also as the baseline anatomy of the patient. During the different treatment sessions, the patient will be positioned as closely to this simulated position as possible to minimize position uncertainties. For imaging during treatment nowadays, the vast majority of radiation therapy systems rely on x-ray imaging using flat panel detectors. Figure 1 demonstrates this kV image acquisition with a patient in treatment position.

Reasons for that are the relatively low complexity, versatile applications, and affordability of these x-ray imaging systems. The x-ray system is used mainly for position verification relative to the treatment planning position but also for during treatment motion monitoring and rather lately for plan adaptation. Dependent on the needs for the different tasks in radiation therapy the x-ray based imaging system mounted on the treatment delivery device (on-board imaging) can be used to acquire different types of images such as 2D x-ray projections, fluoroscopic projection sequences, 3D Cone Beam CT (CBCT) images, and motion resolved 4D CBCT images. Future trends are clearly to use the on-board imaging systems for additional tasks throughout the therapy such as direct organ segmentation and direct dose calculation based on the acquired 3D or 4D CBCT data and for soft-tissue motion monitoring during treatment. Finally, the patient being under fractionated treatment (up to 40 fractions) is sometimes in parallel monitored using CT, MRI, PET/CT, or SPECT/CT to assess the treatment response. This should preferably be (at least partially) replaced by on-board imaging-based procedures available at the treatment device. Technically, the imaging pipeline of the on-board imaging system starts with the x-ray imaging hardware consisting of the x-ray tube and the flat panel detector. The detector acquires 2D projections for example as single frames (triggered images) to perform an image match against the planning CT prior each fraction or to verify the internal anatomy at certain control points during radiation beam delivery within a fraction. The projections can be as well acquired as a sequence either from a single viewing direction to verify for example certain internal motion trajectories or during rotation of the treatment delivery device (typically called gantry) around the patient to perform motion management. An elementary part of the motion management is tracking internal structures of the patient on acquired projections. For volumetric imaging the system rotates around the patient and acquires a sequence of projections which allows to reconstruct a 3D CBCT image. This includes reconstructions that are resolved with respect to phases of respiratory motion (4D CBCT) [75] and even respiratory as well as cardiac motion (5D CBCT) [68]. Having the images at hand, deformable image registration and automatic segmentation [26] are recently a topic of growing interest in the context of adaptive radiotherapy. In the following, we want to discuss where it is already state of the art or where we see the potential (or already evidence) to apply learning-based methods to improve X-ray guided radiation therapy and thus clinical outcome.

2 Motion Monitoring During Treatment

Unlike conventional radiation therapy delivery schemes that typically deliver 2 Gy per daily fraction over several weeks, Stereotactic Body Radiation Therapy (SBRT) delivers high doses in a few or even a single fraction (6–24 Gy, 1–8 fractions) [19, 63]. This allows for an increased tumor control and reduces toxicity in healthy tissues. To ensure a sub-millimeter accuracy for the high dose deposition in SBRT, during-treatment motion management is indispensable. Correct patient setup on the day of the treatment is accomplished by image registration on a 3D CBCT, followed by a couch adaptation. A position verification during delivery of the fraction could be a 3D-3D match at mid- and post-delivery time with a re-acquired CBCT. Alternatively, a 2D-2D match or a 2D-3D match at certain angles can be applied. In the above cases the treatment beam is interrupted and resumed after the position had been confirmed. The drawback is that during actual treatment delivery the therapist is blind to any motion that occurs. An excellent overview of classical state of the art motion monitoring methods with the beam on is given in [6]. Of interest are the tracking methods that use kV/kV imaging with triangulation for 3D position information (CyberKnife^®), Sequential Stereo (Varian Medical Systems), which sequentially acquires 2D images with a single imager, and approaches based on a Probability Density Function (PDF). Proclaimed accuracy of the Euclidean distance in 3D space is 0.33 mm on phantom data and a time delay to acquire sufficient images for the 3D reconstruction is to be considered. Alternatively, updating the raw image stack with the latest 2D acquisition, smears out the position change that occurred on the last 2D image in the 3D volume that also contains all previous images that did not undergo this motion. Methods based on a PDF map based on pre-treatment-day acquired 4DCT data [36] proclaim an accuracy of the Euclidean distance in 3D space of 1.64 ± 0.73 mm. Deep learning methods are being developed to further improve these results and are being discussed below for both bony structures and soft tissue.

2.1 Tracking of Bony Structures

For spine tumors, neighboring vertebrae can serve as anatomical landmarks and periodically acquired 2D kilovoltage (kV) images during treatment allow for a fast detection model to compare the vertebrae positions to the reference positions for a specific gantry angle. A CTV to PTV margin, representing all expected uncertainties including motion during delivery, is recommended to be \({<}\)3 mm in [49]. Recommendations on patient setup accuracy (positioning of the patient on the couch before delivery of the treatment) are \({<}\)2 mm for translations, \({<}\)3\(^{\circ }\)–4\(^{\circ }\) for roll and pitch and \({<} 10^{\circ }\) for yaw, according to [5]. A dosimetric study for spine stereotactic treatments recommends a patient setup translational error \({\le }\)1 mm and a rotational error \(\le \)2\(^{\circ }\) [79] while the rotational setup error recommendation is reduced to \({\le }\)1\(^{\circ }\) in [15]. Note that in all above situations a 3D setup CBCT or other volumetric verification is available. However, as the patient is not supposed to move anymore after correct setup, the above recommendations can also be projected to in-treatment position monitoring. An intrafraction study of spine SBRT treatments that acquired CBCTs during treatment delivery reports position standard deviations of up to 1.3 mm, and this for each of the three main axes: chest-back, left-right and head-feet [49].

In [66] a Deep Learning (DL) model based on Mask R-CNN (Regional Convolutional Neural Network) is described for vertebra detection of the thoracic spine (T9–T12) and the lumbar spine (L1–L4). It differs from the above methods in two key aspects: First, the model does not rely on temporal imaging information, acquired prior to the delivery time instance where the position is verified. Second: The model generalizes for vertebra in a human corpus, which means no patient-specific information is needed or models need to be trained prior to treatment delivery. The model allows for a fast structure localization (\({<}\)2 Hz) on 2D kV projection images that are acquired during the VMAT treatment delivery. It allows assessing instant 2D position verification, using segmentation, along the delivery of the VMAT arc as well as sequential (delayed) 3D position verification, when the subsequent projection images are included in a Digital TomoSynthesis (DTS) or CBCT. Alternatively, making use of a stereoscopic dual imager setup, the 2D position pairs can be triangulated to obtain an instant 3D position. Intensity-Modulated Radiation Therapy (IMRT) and 3D Conformal Radiotherapy (CRT) can benefit from the DL model for fast structure localization as well, as long as 2D kV projection images are acquired. Typical model training times vary from 1–3 days on an Intel Xeon W-2102 2.90 GHz CPU with 32 GB RAM and an NVIDIA GeForce GTX 1080 Ti with 12 GB RAM. The model’s accuracy to detect and estimate motion is assessed offline using the well-known Mean Average Precision (mAP) metric [57]. Although the mAP metric makes a lot of sense in the computer vision domain, from a clinical perspective there are other more important metrics to consider: In this study the motion of the 2D Centre of Mass (CoM) of the vertebra is assessed for the best model as identified by the mAP. The test data in the first assessment contains actual patient data. In addition, a patient-like full-body phantom with vertebrae (PIXY TPO-1067 [38]) in treatment position is moved in a controlled setup and the motion detection is assessed by the DL landmark detection model for vertebra and compared to its ground truth.

An ordinary 2D kV projection image (Fig. 2, left) needs to be provided to the model, which returns a segmentation mask, a bounding box and a classification label (not shown) for each vertebra that is detected (Fig. 2, right). Additionally, the 2D CoM is calculated from the segmentation. Figure 3 summarizes the model performance on CoM motion detection (for isocentric shifts/rotations), based on 50 structures, detected on different projection angles uniformly distributed over the acquired arc. The motion was introduced in the head-feet (vertical) direction and the horizontal direction. Depending on the gantry angle this would correspond to a combination of a chest-back and a lateral motion. To detect a rotational change based on the CoM (a single point), at least 2 vertebrae CoMs are required. This study considers all vertebrae in the field of view. Figure 4 shows the probability of a tracking error in the range of 0–2 mm. The different curves show the probability at different shift amplitudes that were carried out as well as the probability when all shift amplitudes are evaluated together.

The second assessment involves the PIXY patient-like full-body phantom with vertebrae. The results for a position change detection based on a single vertebra are shown in Figure 5. The data for one shift \(\textit{\textbf{s}}\) contains a total of 40 structure shifts, detected on projection images that are orthogonal to the chest-back or lateral axes (Gantry angles: 0\(^{\circ }\), 90\(^{\circ }\), 180\(^{\circ }\) and 270\(^{\circ }\)). In total three such shifts \(\textit{\textbf{s}}\) were analyzed: 25.46 mm, 11.31 mm and 4.24 mm. Figure 6 shows the probability of a tracking error in the range of 0–2 mm. The different curves show the probability at the different shift amplitudes that were carried out as well as the probability when all shift amplitudes are evaluated together.

The above results show a sensitivity for positional changes in the range of 1.5 mm, with a median below 0.5 mm. Combining positional information of all vertebrae visible on a single projection image yields a sub-millimeter motion detection up to the smallest shift of 1 pixel-equivalent on the detector. The experiments with the phantom confirm these results. Spine rotations above 1\(^{\circ }\) can be identified, at 0.5\(^{\circ }\) detection becomes unstable.

2.2 Soft Tissue Tracking

Soft tissue position tracking is crucial in motion monitoring during radiation therapy to ensure that high dose delivery is confined to the tumor (Fig. 7) and not to the surrounding healthy tissue. Significant efforts have been made to improve the accuracy and robustness of soft tissue tracking in the past [39], where kV x-ray imaging is probably the most commonly used imaging modality for a number of practical reasons. One big challenge is the lack of sufficient contrast between soft tissue and background, which makes it different from regular visual object tracking in computer vision [47]. To tackle this issue, different approaches have been proposed in the literature by either utilizing treatment planning information or exploiting medical physics knowledge in imaging. Machine learning, especially deep learning, algorithms have become increasingly popular in this domain.

Treatment planning imaging contains rich information about target characteristics and motion that can be represented by mathematical models or encoded in deep neural networks (DNNs). In [35], 3D diaphragm motion models were generated from segmented 4D CT images and then forward projected to the 2D X-ray panel geometry for diaphragm tracking. In [86], the simulation CT was deformed and transformed to generate enough synthetic digitally reconstructed radiographs (DRRs) along with known tumor locations. Then, a DNN was used to model the relation between DRRs and their corresponding bounding boxes of tumors. This model was applied to predict tumor locations in real projections acquired during the actual treatment.

Physics-based approaches aim at exploiting hardware advances to improve soft tissue contrast. Dual energy (DE) imaging and multi-layer detectors are among these promising technologies. For example, a fast-kV switching DE fluoroscopy was implemented on a bench top system by alternating between high and low x-ray energies. Bony anatomy was suppressed using the classical weighted logarithm subtraction (WLS) method [34]. A deep learning model was used to improve the accuracy of WLS [33]. In X-ray imaging, a stacked flat panel detector design allows to get a plurality of images with low and high signal to noise ratio (SNR) and high and low spatial resolution, respectively. Image fusion schemes are available to take advantage of such “low - high” and “signal - resolution” information to combine images together with the aim to maximize SNR of the fused image and prevent loss of spatial resolution [88].

3 CBCT Image Reconstruction

Nowadays, volumetric imaging is arguably an integral part of the workflow in radiation therapy. While initially it was mainly intended for bone-based 3D positioning of the patient, it has progressively become an important tool for soft tissue matching thanks to improvements in image quality of the reconstructed volume. Recently, these improvements allowed for adaptive radiation therapy where the treatment plan is being adapted to the anatomical changes directly during a fraction prior the actual treatment. Clearly, improving image quality is the key aspect of the successful deployment of volume reconstruction methods and the actual progress in machine learning brings new opportunities in this area.

A typical image reconstruction pipeline consists of pre-processing performed in the projection space, analytical or iterative reconstruction, and volume post-processing.

3.1 X-ray Projection Pre-processing

The pre-processing phase already provides several opportunities for a successful application of deep learning (DL) methods. A good example is the correction of signal degradation caused by X-ray photons that are scattered within the patient body [22, 54]. The approach is to use Monte Carlo Methods as forward simulation of primary and scatter signal and train a U-net type regression to predict the scatter component out of the combined signal acquired by the flat panel detector.

Metal artifacts in CBCT are another prominent example where projection-based corrections with the help of DL show advances compared to classical approaches [50, 52, 53, 60, 83]. The speciality here is that the X-ray beams penetrating metal objects are affected by various physical effects, namely beam hardening, increased scatter, and high noise because of the strong attenuation. This makes it nearly impossible to use the affected information directly and thus requires to include the prior knowledge into the reconstruction process.

3.2 CBCT Volume Post-processing

An obvious application of DL methods is post-processing of CBCT volumes performed to correct for inaccuracies and artifacts. A beneficial aspect here is that the generation of training data for supervised learning is often quite straightforward: A prominent scenario is to generate training pairs with the complete projection set as ground truth and a projection subset (sparse-view) as a simulation of low dose scans [31, 43, 85]. All these methods apply classical filtered back-projection (FBP) to perform the first reconstruction affected by typical sparsity streaks and use neural networks, such as U-Nets, to improve the quality of the final image. The reconstruction and subsequent correction of limited-angle acquisitions have been addressed using a similar approach [70, 80]. However, it has been pointed out that these approaches cannot guarantee that the output image faithfully represents the anatomy of the patient and does not fabricate fictitious structures due to the prior knowledge trained on a patient population. A possible mitigation is through the comparison of the reconstructed volume against the acquired projections by applying forward projections and enforce minimal differences [46]. A hybrid approach has been proposed in [37] to reduce artifacts related to the incompleteness of input projections due to limited-angle, sparseness and truncated acquisition: In the first phase, an U-net is employed to complete the insufficient input data. The completed set combining both the measured and computed projections is then reconstructed by conventional iterative reconstruction technique. A rather rarely addressed topic is the field of motion artifact reduction. This is presumably because of the general lack of motion-free ground truth training data. In [61] we proposed a framework for CBCT motion artifact simulation and applied it in a proof of principal study [62] to train a U-net based artifact reduction method in the image domain (see Fig. 8).

3.3 Iterative CBCT Reconstruction Methods

Apart from FBP, iterative methods represent a common technique for CT image reconstruction. Here the reconstruction corresponds to a step-by-step minimisation of the objective loss function \(\ell (f,p,A)\) defined in terms of the reconstructed volume f, acquired projections p and the system matrix A relating the volume voxels to pixels in the projection space. The loss function \(\ell (f,p,A) = \psi (f,p,A)\,+\,r(f)\) consists of the data fidelity term \(\psi (f,p,A)\) enforcing consistency of the reconstructed volume with acquired projections and the regularization term r(f) encouraging reconstructions to satisfy a priori assumed properties, e.g. piece-wise smoothness. The common choices for the former include the \(L_2\)-norm projection error \(||A f - p||_2^2\) [28] or the statistical loss function [20, 65] taking into account the stochastic nature of signal detected in the projections. The regularization is often represented by variants of the total-variation [23]. During each iteration step, the update of f is calculated by comparing the forward-projected volume Af to the acquired projections; the mathematical formulation depends on the precise form of the objective loss function, the chosen regularizer and the iteration scheme [2, 28]. Machine learning techniques can then alter this general scheme in a number of ways.

The first set of methods includes learning prior information from a training dataset containing high-quality reconstruction or alternatively general images. The learned information is then used at each iteration step to enhance the quality of limited-angle or sparse-view reconstructions. The learned information can be as simple as texture content in similar patches [42] while deploying deep neural networks allows for extracting higher-level features and for greater expressivity. In [10], a deep residual convolutional neural network (CNN) was trained for image denoising on the COCO dataset [51] and then used as a filter at each iteration step. In [13], a CNN is trained on a dataset containing high-quality reconstructions to yield ground truth images by refining unfinished iterations. The trained CNN then defines a regularization term enforcing the volume to lie close to the ground truth.

In another set of methods, each iteration step is partially replaced by a deep neural network and the whole unrolled system is trained at once, having the subsampled set of projections as an input and high-quality reconstructions as target. Examples include [77] or [11, 17]; in the latter, a DenseNet-inspired network is used in each iteration step to propose an optimal volume update based on the current as well as the previous gradients of the loss function; this is in fact a generalization of the Nesterov momentum [78] used for the speedup of iterative reconstruction.

3.4 End-to-End CBCT Image Reconstruction Learning

The last class of reconstruction algorithms that we want to mention here is applying deep learning in an end-to-end approach where pre- and post-processing (in projection and volume domain) are jointly trained. Here one of the foundation papers by Würfl et al. [81] uses neural networks to learn filter and weighting in projection space while evaluating loss functions in the image domain. Other prominent examples are the previously mentioned methods for metal artifact reduction by Lin and Lyu et al. [52, 53] but also for the limited angle scans [30]. Zhu et al. [87] proposed to learn the complete reconstruction process including domain transformation between the projection and volume space. That implies learning the system matrix which is normally well known.

An alternative approach to DL-based end-to-end reconstruction is presented in [24]: a continuum of intermediate representations is employed to break down the original problem, where line integrals are gradually restricted via partial line integrals until the level of image voxels is attained. The resulting hierarchy is mapped onto the network architecture, allowing for significant reduction of the computational complexity.

3.5 4D CBCT Reconstruction

In 4D reconstruction normally 10 volumes with respect to their respiratory phase are reconstructed from acquisitions with approximately the same number of projections as needed for a 3D scans and subsequently utilizes approximately the same dose. This leads to strong under-sampling of the phases and makes it even harder to obtain a certain image quality. Classical approaches try to join information from all phases by using an initial combined reconstruction (MKB) [76], temporal regularization (4DTV) [64], or by applying deformable registration between the phases (MoCo) [7].

In [12] an iterative deep learning approach derived from the 3D AirNet method [11] has been applied to reduce sparseness streaks in 4D CBCT reconstructions. Zhang et al. [84] proposes a motion compensated reconstruction algorithm applying deep learning for patient population based deformation field refinements. A method for the suppression of sparseness artifacts in cardiac CT imaging based on learned data exchange between phases with cyclic loss has been presented by Kang et al. [44].

In motion resolved reconstructions the challenge is to overcome the sparse sampling of the motion resolved images by reusing information from other motion states. This implies that the ongoing motion needs to be resolved up to a certain extent, what makes the problem even more ill-posed and therefore prior information about anatomy and physiological motion needs to be taken into account. Further challenges are to overcome the phase correlated reconstruction and to address motion amplitudes [69].

4 Deep Learning for Organ Segmentation

For the generation of a radiotherapy treatment plan, the position of the tumor as well as surrounding organs need to be known. In a typical workflow, a clinician contours these structures on either a CT or an MRI image.

Previously, the automatic segmentation of anatomical structures was performed by heuristic algorithms, such as thresholding [3] or watershed [74] and joined to be applied for a complete anatomical site [29]. However, these algorithms need to be specifically designed for each organ. The advancements in deep learning now make it feasible to generate segmentation solutions for a multitude of different anatomical structures using the same or similar underlying neural networks. These are then trained on example segmentations to adapt to the particular structure. The underlying neural network is most often a convolutional neural network as its architecture is especially suited for image-related tasks. More specifically the U-Net [67] and derivations from it, such as the Tiramisu [40] and BibNet [71], are commonly employed for anatomical segmentations.

One limitation of the above-mentioned networks is their incapability to learn strong shape priors. This leads to inadequate performance in case of weak image quality. Methods such as anatomically constrained neural networks [59], try to circumvent it by forcing the network to learn a shape representation. With the described methods above, organ segmentation algorithms have been developed for many different anatomical sites such as abdomen [9], female breast [56, 71], head and neck [58], female pelvis [32], male pelvis [73] and thorax [18]. In the case of head and neck and male pelvis, performance on par with clinicians have been reported.

A special challenge is to perform automatic segmentation directly on CBCT reconstructions [1] due to its, in some aspects, inferior image quality compared to CT (see Fig. 9). Residual motion is apart from scatter one of the most prominent challenges that especially makes it hard to define the ground truth.

The integration of these algorithms in clinical practice faces one additional hurdle: The data used for training the deep neural network may originate from a different hospital or even geography. For a CT segmentation, there is evidence that for most structures the quality remains unimpaired by training on data from a different hospital as long as the segmentation guidelines are identical [72]. For MRI segmentation, the quality improves if data from the employing hospital is included in the training or as part of an on-boarding process [27]. An alternative way to overcome this challenge is the implementation of a distributed learning method that is able to leverage the data from multiple hospitals in a privacy-preserving manner [14].

5 Deformable Image Registration

It is of upmost importance in radiotherapy that the prescribed dose is being delivered to the target as conformal as possible while sparing neighboring organs at risk.

Patients anatomy changes from day to day. This might lead to misadministrating the dose, thus not fulfilling the clinical goals. To avoid this, adaptive radiotherapy was introduced (ART) [45]. Here, a patient image of the day is used to update the deprecated treatment plan. Furthermore, one needs to track the absorbed dose in each organ to ensure the correct dose coverage of the tumor and not overdosing the risk organs during the whole treatment based on multiple fractions. This process is called dose accumulation.

In both steps, deformable image registration (DIR) is being used. DIR is morphing the original image to the updated image set of the day. The “path” of each voxel is saved as a 3D vector in a deformation vector field (see Fig. 10) which can later be used to deform the dose as well for accumulation.

However, due to the vast number of voxels that need to be compared and moved around based on optimization algorithms, a conventional DIR [8] can take up to minutes until it finishes. That is the first opportunity for DL to help. As soon as the patient is positioned and the images of the day are recorded everything should be fast to start dose delivery to avoid anatomy changes [21] or patient position changes. DL based DIR can be done within a fraction of a second [4, 16] for 3D image volumes, which is a considerable improvement over, e.g. 1 min.

Another point where DL might help to improve the results is later in the process of dose accumulation. Classical DIR algorithms follow a fixed set of parameters and therefore perform better or worse depending on the image which needs to be deformed. Furthermore, there are indefinite ways to deform one image to another and thus, no ground truth actually exists. That is why unsupervised learning is normally being used for these kind of projects, since supervised models would only mimic the behavior of the classical algorithm, i.e. copying its problems and uncertainties.

Unsupervised learning detects patterns on its own, and thus might be able to outperform existing solutions [4, 55].

Several architectures are being used and tested [25]. First approaches relied on typical U-Nets [4] and newer publications are looking into the potential of GANs [82] to either generate the morphed image directly or “just” the deformation vector field.

6 Conclusion

As shown on several examples from the image guided radiation therapy field we see enormous potential of data driven methods to enhance or overcome state of the art algorithms. This can be observed in various stages of the imaging pipeline. Notably, the biggest improvements can be observed where learning based methods are used under consideration of domain knowledge (e.g. x-ray imaging physics) over pure black-box applications. We rate this as motivation to further explore problem specific network architectures and loss functions to obtain solutions that leverage physical or physiological constraints to reduce the solution space for the training process. Beyond this we see a lot of synergies between the described domain solutions where integrated solutions like e.g. deformable registration with implicit segmentation or image reconstruction with implicit deformable registration against prior acquisitions could be future developments. In conclusion we sense a wide agreement in the scientific community that deep learning will be the next evolutionary step in the field.

References

Adamson, P.M., Arrate, F., Jordan, P.: Evaluation of abdominal autosegmentation versus inter-observer variability on a high-speed ring gantry CBCT system. In: AAPM Annual Meeting, San Antonio, TX (2019)
Google Scholar
Andersen, A., Kak, A.: Simultaneous algebraic reconstruction technique (SART): a superior implementation of the ART algorithm. Ultrason. Imaging 6(1), 81–94 (1984)
Google Scholar
Bae, K.T., Giger, M.L., Chen, C.T., Kahn Jr., C.E.: Automatic segmentation of liver structure in CT images (1993)
Google Scholar
Balakrishnan, G., Zhao, A., Sabuncu, M.R., Guttag, J., Dalca, A.V.: VoxelMorph: a learning framework for deformable medical image registration. IEEE Trans. Med. Imaging 38(8), 1788–1800 (2019). https://doi.org/10.1109/tmi.2019.2897538
Article Google Scholar
Benedict, S.H., et al.: Stereotactic body radiation therapy: the report of AAPM Task Group 101. Med. Phys. 37(8), 4078–4101 (2010)
Google Scholar
Bertholet, J., Knopf, A., et al.: Real-time intrafraction motion monitoring in external beam radiotherapy. Phys. Med. Biol. 64(15), 15TR01 (2019)
Google Scholar
Brehm, M., Paysan, P., Oelhafen, M., Kachelrieß, M.: Artifact-resistant motion estimation with a patient-specific artifact model for motion-compensated cone-beam CT. Med. Phys. 40(10), 101913 (2013)
Google Scholar
Brown, L.G.: A survey of image registration techniques. ACM Comput. Surv. 24(4), 325–376 (1992)
Google Scholar
Cai, J., Xia, Y., Yang, D., Xu, D., Yang, L., Roth, H.: End-to-end adversarial shape learning for abdomen organ deep segmentation. In: Suk, H.-I., Liu, M., Yan, P., Lian, C. (eds.) MLMI 2019. LNCS, vol. 11861, pp. 124–132. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32692-0_15
Chapter Google Scholar
Chen, B., Xiang, K., Gong, Z., Wang, J., Tan, S.: Statistical iterative CBCT reconstruction based on neural network. IEEE Trans. Med. Imaging 37(6), 1511–1521 (2018)
Google Scholar
Chen, G., et al.: AirNet: fused analytical and iterative reconstruction with deep neural network regularization for sparse-data CT. Med. Phys. (2020). https://doi.org/10.1002/mp.14170
Chen, G., Zhao, Y., Huang, Q., Gao, H.: 4D-AirNet: a temporally-resolved CBCT slice reconstruction method synergizing analytical and iterative method with deep learning. Phys. Med. Biol. (2020). https://doi.org/10.1088/1361-6560/ab9f60
Chun, I., Huang, Z., Lim, H., Fessler, J.: Momentum-Net: fast and convergent iterative neural network for inverse problems. arXiv preprint arXiv:1907.11818, July 2019
Czeizler, E., et al.: Using federated data sources and Varian Learning Portal framework to train a neural network model for automatic organ segmentation. Physica Medica 72, 39–45 (2020)
Google Scholar
Dahele, M., Verbakel, W.: Treatment planning, intrafraction monitoring and delivery: linear accelerator-based stereotactic spine radiotherapy. Stereotact Body Radiat Ther Spinal Metastasis Future Medicine Ltd., pp. 37–55 (2014)
Google Scholar
De Vos, B.D., Berendsen, F.F., Viergever, M.A., Sokooti, H., Staring, M., Išgum, I.: A deep learning framework for unsupervised affine and deformable image registration. Med. Image Anal. 52, 128–143 (2019)
Google Scholar
Ding, Q., Chen, G., Zhang, X., Huang, Q., Ji, H., Gao, H.: Low-dose CT with deep learning regularization via proximal forward backward splitting. Phys. Med. Biol. (2020). https://doi.org/10.1088/1361-6560/ab831a
Dong, X., et al.: Automatic multiorgan segmentation in thorax CT images using U-Net-GAN. Med. Phys. 46, 2157–2168 (2019)
Google Scholar
Dunne, E.M., Fraser, I.M., Liu, M.: Stereotactic body radiation therapy for lung, spine and oligometastatic disease: current evidence and future directions. Ann. Transl. Med. 6(14), 283 (2018). https://doi.org/10.21037/atm.2018.06.40
Article Google Scholar
Elbakri, I.A., Fessler, J.A.: Statistical image reconstruction for polyenergetic X-ray computed tomography. IEEE Trans. Med. Imaging 21(2), 89–99 (2002)
Google Scholar
Elmahdy, M.S., et al.: Robust contour propagation using deep learning and image registration for online adaptive proton therapy of prostate cancer. Med. Phys. 46, 3329–3343 (2019)
Google Scholar
Erath, J., Vöth, T., Maier, J., Kachelrieß, M.: Forward and cross-scatter estimation in dual source CT using the deep scatter estimation (DSE). In: Medical Imaging 2019: Physics of Medical Imaging, vol. 10948, p. 24. International Society for Optics and Photonics (2019). https://doi.org/10.1117/12.2512718
Erdogan, H., Fessler, J.A.: Ordered subsets algorithms for transmission tomography. Phys. Med. Biol. 44(11), 2835–2851 (1999)
Google Scholar
Fu, L., De Man, B.: A hierarchical approach to deep learning and its application to tomographic reconstruction. In: 15th International Meeting on Fully Three-Dimensional Image Reconstruction in Radiology and Nuclear Medicine, vol. 11072, p. 1107202. International Society for Optics and Photonics (2019)
Google Scholar
Fu, Y., Lei, Y., Wang, T., Curran, W.J., Liu, T., Yang, X.: Deep learning in medical image registration: a review. Phys. Med. Biol. (2020). https://doi.org/10.1088/1361-6560/ab843e
Fu, Y., et al.: Pelvic multi-organ segmentation on cone-beam CT for prostate adaptive radiotherapy. Med. Phys. (2020). https://doi.org/10.1002/mp.14196
Gibson, E., et al.: Inter-site variability in prostate segmentation accuracy using deep learning. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11073, pp. 506–514. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00937-3_58
Chapter Google Scholar
Gordon, R., Bender, R., Herman, G.: Algebraic Reconstruction Techniques (ART) for three-dimensional electron microscopy and X-ray photography. J. Theor. Biol. 29(3), 471–481 (1970)
Google Scholar
Haas, B., Coradi, T., et al.: Automatic segmentation of thoracic and pelvic CT images for radiotherapy planning using implicit anatomic knowledge and organ-specific segmentation strategies. Phys. Med. Biol. 53(6), 1751–1771 (2008)
Google Scholar
Hammernik, K., Würfl, T., Pock, T., Maier, A.: A deep learning architecture for limited-angle computed tomography reconstruction. In: Maier-Hein, K.H., Deserno, T.M., Handels, H., Tolxdorff, T. (Hrsg.) Bildverarbeitung für die Medizin 2017. INFORMAT, pp. 92–97. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54345-0_25
Han, Y.S., Yoo, J., Ye, J.C.: Deep residual learning for compressed sensing CT reconstruction via persistent homology analysis. arXiv preprint arXiv:1611.06391, November 2016. http://arxiv.org/abs/1611.06391
Hänsch, A., Dicken, V., Grass, T., Morgas, T., Klein, J., Meine, H.: Deep learning based segmentation of organs of the female pelvis in CBCT scans for adaptive radiotherapy using CT and CBCT data. Comput. Assist. Radiol. Surg. CARS 2018, 133 (2018)
Google Scholar
Haytmyradov, M., et al.: Adaptive weighted log subtraction based on neural networks for markerless tumor tracking using dual energy fluoroscopy. Med. Phys. 47(2), 672–680 (2020)
Google Scholar
Haytmyradov, M., et al.: Markerless tumor tracking using fast-kV switching dual-energy fluoroscopy on a benchtop system. Med. Phys. 46(7), 3235–3244 (2019)
Google Scholar
Hindley, N., Keall, P., Booth, J., Shieh, C.: Real-time direct diaphragm tracking using kV imaging on a standard linear accelerator. Med. Phys. 46(10), 4481–4489 (2019)
Google Scholar
Hirai, R., Sakata, Y., Tanizawa, A., Mori, S.: Real-time tumor tracking using fluoroscopic imaging with deep neural network analysis. Physica Medica 59, 22–29 (2019)
Google Scholar
Huang, Y., Preuhs, A., Manhart, M., Lauritsch, G., Maier, A.: Data consistent CT reconstruction from insufficient data with learned prior images. arXiv preprint arXiv:2005.10034 (2020)
Radiology Support Devices Inc.: PIXY: Anthropomorphic Phantoms - Radiology Support Devices. http://rsdphantoms.com/radiology/anthropomorphic-phantoms/. Accessed 7 Oct 2019
Jaffray, D.: Image-guided radiotherapy: from current concept to future perspectives. Nat. Rev. Clin. Oncol. 9(12), 688 (2012)
Google Scholar
Jégou, S., Drozdzal, M., Vázquez, D., Romero, A., Bengio, Y.: The one hundred layers Tiramisu: fully convolutional DenseNets for semantic segmentation. CoRR abs/1611.09326 (2016). http://arxiv.org/abs/1611.09326
Jeung, A., Zhu, L., Mostafavi, H., van Heteren, J.: What image features are good for correlation-based tracking algorithms used for soft tissue monitoring in X-ray imaging. In: AAPM Annual Meeting, San Antonio, TX (2019)
Google Scholar
Jia, X., et al.: Statistical CT reconstruction using region-aware texture preserving regularization learning from prior normal-dose CT image. Phys. Med. Biol. 63(22), 225020 (2018)
Google Scholar
Jin, K.H., McCann, M.T., Froustey, E., Unser, M.: Deep convolutional neural network for inverse problems in imaging. IEEE Trans. Image Process. 26(9), 4509–4522 (2017)
MathSciNet MATH Google Scholar
Kang, E., Koo, H.J., Yang, D.H., Seo, J.B., Ye, J.C.: Cycle-consistent adversarial denoising network for multiphase coronary CT angiography. Med. Phys. 46(2), 550–562 (2019)
Google Scholar
Keall, P.J., Hsu, A., Xing, L.: Image-guided adaptive radiotherapy. In: Hoppe, R.T., Phillips, T.L., Roach, M. (eds.) Leibel and Phillips Textbook of Radiation Oncology, 3rd edn., pp. 213–223. W.B. Saunders, Philadelphia (2010)
Google Scholar
Kofler, A., Haltmeier, M., Kolbitsch, C., Kachelrieß, M., Dewey, M.: A U-Nets cascade for sparse view computed tomography. In: Knoll, F., Maier, A., Rueckert, D. (eds.) MLMIR 2018. LNCS, vol. 11074, pp. 91–99. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00129-2_11
Chapter Google Scholar
Kristan, M., et al.: A novel performance evaluation methodology for single-target trackers. IEEE Trans. Pattern Anal. Mach. Intell. 38(11), 2137–2155 (2016)
Google Scholar
Lecchi, M., Fossati, P., Elisei, F., Orecchia, R., Lucignani, G.: Current concepts on imaging in radiotherapy. Eur. J. Nucl. Med. Mol. Imaging 35(4), 821–837 (2008)
Google Scholar
Li, W., Sahgal, A., Foote, M., Millar, B.A., Jaffray, D.A., Letourneau, D.: Impact of immobilization on intrafraction motion for spine stereotactic body radiotherapy using cone beam computed tomography. Int. J. Radiat. Oncol. Biol. Phys. 84(2), 520–526 (2012)
Google Scholar
Liao, H., Lin, W.A., Zhou, S.K., Luo, J.: ADN: artifact disentanglement network for unsupervised metal artifact reduction. IEEE Trans. Med. Imaging 39(3), 634–643 (2020)
Google Scholar
Lin, T., et al.: Microsoft COCO: common objects in context (2014). http://arxiv.org/abs/1405.0312
Lin, W.A., et al.: DuDoNet: dual domain network for CT metal artifact reduction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10504–10513 (2020)
Google Scholar
Lyu, Y., Lin, W.A., Lu, J., Zhou, S.K.: DuDoNet++: encoding mask projection to reduce CT metal artifacts. arXiv preprint arXiv:2001.00340 (2020)
Maier, J., Sawall, S., Kachelrieß, M.: Deep scatter estimation (DSE): feasibility of using a deep convolutional neural network for real-time x-ray scatter prediction in cone-beam CT. In: Medical Imaging, vol. 10573, pp. 393–398. SPIE (2018). https://doi.org/10.1117/12.2292919
Mansilla, L., Milone, D.H., Ferrante, E.: Learning deformable registration of medical images with anatomical constraints. Neural Netw. 124, 269–279 (2020). https://doi.org/10.1016/j.neunet.2020.01.023
Article Google Scholar
Men, K., et al.: Fully automatic and robust segmentation of the clinical target volume for radiotherapy of breast cancer using big data and deep learning. Physica Medica 50, 13–19 (2018)
Google Scholar
Microsoft: COCO - Common Objects in Context. http://cocodataset.org/#detection-eval. Accessed 26 Sept 2019
Nikolov, S., et al.: Deep learning to achieve clinically applicable segmentation of head and neck anatomy for radiotherapy. CoRR abs/1809.04430 (2018)
Google Scholar
Oktay, O., et al.: Anatomically Constrained Neural Networks (ACNN): application to cardiac image enhancement and segmentation. CoRR abs/1705.08302 (2017). http://arxiv.org/abs/1705.08302
Park, H.S., Lee, S.M., Kim, H.P., Seo, J.K., Chung, Y.E.: CT sinogram-consistency learning for metal-induced beam hardening correction. Med. Phys. 45(12), 5376–5384 (2018)
Google Scholar
Paysan, P., Munro, P., Scheib, S.G.: CT based simulation framework for motion artifact and ground truth generation of cone-beam CT. In: AAPM Annual Meeting, San Antonio, TX (2019)
Google Scholar
Paysan, P., Strzelecki, A., Arrate, F., Munro, P., Scheib, S.G.: Convolutional network based motion artifact reduction in cone-beam CT. In: AAPM Annual Meeting, San Antonio, TX (2019)
Google Scholar
Potters, L., et al.: American society for therapeutic radiology and oncology and american college of radiology practice guideline for the performance of stereotactic body radiation therapy. Int. J. Radiat. Oncol. Biol. Phys. 60(4), 1026–1032 (2004)
Google Scholar
Ritschl, L., Sawall, S., Knaup, M., Hess, A., Kachelrie, M.: Iterative 4D cardiac micro-CT image reconstruction using an adaptive spatio-temporal sparsity prior. Phys. Med. Biol. 57(6), 1517–1525 (2012)
Google Scholar
Rockmore, A.J., Macovski, A.: A maximum likelihood approach to emission image reconstruction from projections. IEEE Trans. Nucl. Sci. 23(4), 1428–1432 (1976)
Google Scholar
Roggen, T., Bobic, M., Givehchi, N., Scheib, S.G.: Deep Learning model for markerless tracking in spinal SBRT. Physica Medica Eur. J. Med. Phys. 74, 66–73 (2020)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. CoRR abs/1505.04597 (2015). http://arxiv.org/abs/1505.04597
Sauppe, S., Hahn, A., Brehm, M., Paysan, P., Seghers, D., Kachelrieß, M.: PO-0934: cardio-respiratory motion compensation for 5D thoracic CBCT in IGRT. Radiother. Oncol. 119, S452–S453 (2016)
Google Scholar
Sauppe, S., Kuhm, J., Brehm, M., Paysan, P., Seghers, D., Kachelrieß, M.: Motion vector field phase-to-amplitude resampling for 4D motion-compensated cone-beam CT. Phys. Med. Biol. 63(3), 035032 (2018)
Google Scholar
Schnurr, A.K., Chung, K., Russ, T., Schad, L.R., Zöllner, F.G.: Simulation-based deep artifact correction with convolutional neural networks for limited angle artifacts. Zeitschrift fur Medizinische Physik 29(2), 150–161 (2019)
Google Scholar
Schreier, J., Attanasi, F., Laaksonen, H.: A full-image deep segmenter for CT images in breast cancer radiotherapy treatment. Front. Oncol. 9, 677 (2019)
Google Scholar
Schreier, J., Attanasi, F., Laaksonen, H.: Generalization vs. specificity. In: which cases should a clinic train its own segmentation models? Front. Oncol. 10, 675 (2020)
Google Scholar
Schreier, J., Genghi, A., Laaksonen, H.: Clinical evaluation of a full-image deep segmentation algorithm for the male pelvis on cone-beam CT and CT. Radiother. Oncol. 145, 1–6 (2020)
Google Scholar
Shojaii, R., Alirezaie, J., Babyn, P.: Automatic lung segmentation in CT images using watershed transform. In: Proceedings of the International Conference on Image Processing, ICIP (2005)
Google Scholar
Sonke, J.J., Zijp, L., Remeijer, P., van Herk, M.: Respiratory correlated cone beam CT. Med. Phys. 32(4), 1176–1186 (2005)
Google Scholar
Star-Lack, J., et al.: A modified McKinnon-Bates (MKB) algorithm for improved 4D cone-beam computed tomography (CBCT) of the lung. Med. Phys. 45(8), 3783–3799 (2018)
Google Scholar
Vishnevskiy, V., Rau, R., Goksel, O.: Deep variational networks with exponential weighting for learning computed tomography. In: Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P.-T., Khan, A. (eds.) MICCAI 2019. LNCS, vol. 11769, pp. 310–318. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32226-7_35
Chapter Google Scholar
Wang, A.S., Stayman, J.W., Otake, Y., Vogt, S., Kleinszig, G., Siewerdsen, J.H.: Accelerated statistical reconstruction for C-arm cone-beam CT using Nesterov’s method. Med. Phy. 42(5), 2699–2708 (2015)
Google Scholar
Wang, H., et al.: Dosimetric effect of translational and rotational errors for patients undergoing image-guided stereotactic body radiotherapy for spinal metastases. Int. J. Radiat. Oncol. Biol. Phys. 71(4), 1261–1271 (2008)
Google Scholar
Wang, J., Liang, J., Cheng, J., Guo, Y., Zeng, L.: Deep learning based image reconstruction algorithm for limited-angle translational computed tomography. PLoS ONE 15(1), e0226963 (2020)
Google Scholar
Würfl, T., Ghesu, F.C., Christlein, V., Maier, A.: Deep learning computed tomography. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9902, pp. 432–440. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46726-9_50
Chapter Google Scholar
Zhang, X., Jian, W., Chen, Y., Yang, S.T.: Deform-GAN: an unsupervised learning model for deformable registration. ArXiv abs/2002.11430 (2020)
Google Scholar
Zhang, Y., Yu, H.: Convolutional neural network based metal artifact reduction in X-ray computed tomography. IEEE Trans. Med. Imaging 37(6), 1370–1381 (2018)
Google Scholar
Zhang, Y., Huang, X., Wang, J.: Advanced 4-dimensional cone-beam computed tomography reconstruction by combining motion estimation, motion-compensated reconstruction, biomechanical modeling and deep learning. Vis. Comput. Ind. Biomed. Art 2(1), 1–15 (2019)
Google Scholar
Zhang, Z., Liang, X., Dong, X., Xie, Y., Cao, G.: A sparse-view CT reconstruction method based on combination of DenseNet and deconvolution. IEEE Trans. Med. Imaging 37(6), 1407–1417 (2018)
Google Scholar
Zhao, W., et al.: Markerless pancreatic tumor target localization enabled by deep learning. Int. J. Radiat. Oncol. Biol. Phys. 105(2), 432–439 (2019)
Google Scholar
Zhu, B., Liu, J.Z., Rosen, B.R., Rosen, M.S.: Image reconstruction by domain transform manifold learning. Nature 555(7697), 487–492 (2018)
Google Scholar
ZHu, L., Baturin, P.: Deep neural network image fusion without using training data. In: AAPM ePoster Library (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Varian Medical Systems Imaging Laboratory GmbH, Taefernstr. 7, 5405, Daettwil, Switzerland
Pascal Paysan, Igor Peterlik, Toon Roggen, Liangjia Zhu, Claas Wessels, Jan Schreier, Martin Buchacek & Stefan Scheib

Authors

Pascal Paysan
View author publications
You can also search for this author in PubMed Google Scholar
Igor Peterlik
View author publications
You can also search for this author in PubMed Google Scholar
Toon Roggen
View author publications
You can also search for this author in PubMed Google Scholar
Liangjia Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Claas Wessels
View author publications
You can also search for this author in PubMed Google Scholar
Jan Schreier
View author publications
You can also search for this author in PubMed Google Scholar
Martin Buchacek
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Scheib
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pascal Paysan .

Editor information

Editors and Affiliations

Zurich University of Applied Sciences ZHAW, Winterthur, Switzerland
Frank-Peter Schilling
Zurich University of Applied Sciences ZHAW, Winterthur, Switzerland
Thilo Stadelmann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Paysan, P. et al. (2020). Deep Learning Methods for Image Guidance in Radiation Therapy. In: Schilling, FP., Stadelmann, T. (eds) Artificial Neural Networks in Pattern Recognition. ANNPR 2020. Lecture Notes in Computer Science(), vol 12294. Springer, Cham. https://doi.org/10.1007/978-3-030-58309-5_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-58309-5_1
Published: 02 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58308-8
Online ISBN: 978-3-030-58309-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)