DeepTarget: Gross tumor and clinical target volume segmentation in esophageal cancer radiotherapy
Graphical abstract
Introduction
Esophageal cancer ranks sixth in mortality amongst all cancers worldwide, accounting for 1 in 20 cancer deaths (Bray et al., 2018). As it is usually diagnosed at late stages, radiotherapy (RT) is often one of the primary treatment (Pennathur et al., 2013). The most critical and challenging tasks in RT planning are the gross tumor volume (GTV) and clinical target volume (CTV) delineations, where high radiation doses are applied to those regions to kill cancer cells (Burnet et al., 2004). As shown in Fig. 1, GTV and CTV are correlated yet different regions. While the GTV represents the visible gross tumor region, the CTV outlines the area that covers the microscopic tumorous region, i.e., sub-clinical disease. Spatially, CTV boundaries must contain the GTV and also any involved lymph nodes (LNs). Physician’s delineation principles of GTV and CTV are quite different. The determinant of GTV mainly relies on image appearance clues. Yet, the estimation of CTV requires to first have the GTV, followed by measuring the sub-clinical disease margins through the judgment of both image appearance and spatial distances from the GTV and other involved targets, i.e., LN and organ at risk (OAR). Current clinical protocols rely on manual GTV and CTV delineation, which is time and labor consuming and subject to high inter- and intra-observer variability (Tai, Van Dyk, Yu, et al., 1998, Eminowicz, McCormack, 2015, Nowee, Voncken, Kotte, et al., 2019). This motivates automated approaches for GTV and CTV segmentation, which could potentially increase the target contouring accuracy and consistency, as well as significantly shorten the planning time allowing for timely treatment.
Both GTV and CTV delineation offer their own distinct challenges. The assessment of esophageal GTV by radiotherapy computed tomography (RTCT) alone has been shown to be error prone, due to the poor contrast between the GTV and surrounding tissues (Muijs et al., 2010). Within the clinic, these shortfalls are often addressed by correlating with the patient’s positron emission tomography/computed tomography (PET/CT) scan, when available. These PET/CTs are taken on an earlier occasion to help stage the cancer and decide treatment protocols. Despite misalignments between the PET/CT and RTCT, positron emission tomography (PET) still provides highly useful information to help manually delineate the GTV on the RTCT, thanks to its high contrast in highlighting malignant regions (Leong et al., 2006). As shown in Fig. 2, RTCT and PET can each be crucial for accurate GTV delineation, due to their complementary strengths and weaknesses. Yet, leveraging both diagnostic PET and RTCT requires contending with the unavoidable misalignments between the two scans acquired at different times.
Turning to CTV delineation, its quality depends highly on physician’s experience due to its judgement-based characteristics. For esophageal cancer, this is even more challenging because tumors may potentially spread along the entire esophagus and metastasize up to the neck or down to the upper abdominal LNs, and present adjacent to several OARs, such as the lung (Jin et al., 2018) and airway (Jin et al., 2017). Recent works on automated CTV segmentation mostly operate based on the RTCT appearance alone (Men, Dai, Li, 2017, Men, Zhang, et al., 2018, Wong, Fong, McVicar, et al., 2020). However, as shown in Fig. 3, CTV delineation depends on the radiation oncologist’s visual judgment of both the appearance and the spatial configuration of the GTV, LNs, and OARs, suggesting that only considering the RTCT makes the problem ill-posed (Men, Dai, Li, 2017, Men, Zhang, et al., 2018, Wong, Fong, McVicar, et al., 2020).
By considering their different characteristics and challenges, we propose tailored methods for GTV and CTV delineation to solve each task. Together, these result in a combined workflow that provides a comprehensive solution (named as DeepTarget) for target contouring in esophageal cancer radiotherapy. Specifically, there are four major contributions in our work.
- 1.
For the GTV segmentation, we introduce a new two-stream chained deep network fusion method to incorporate the joint RTCT and PET information for accurate esophageal GTV segmentation (see Fig. 6). One of the streams is trained using the RTCT alone, while the other stream uses both RTCT and registered PET. The former exploits the anatomical appearance features in computed tomography (CT), while the latter takes advantage of PET’s sensitive, but sometimes spurious and overpoweringly strong contrast. The two streams explore tumor characteristics from different perspectives, hence, their predictions can be further deeply fused with the original RTCT to generate a final robust GTV prediction. The misalignment between RTCT and PET/CT is alleviated by a deformable registration with a robust anatomy-guided initialization.
- 2.
For the GTV segmentation, we also introduce a simple yet surprisingly powerful progressive semantically-nested network (PSNN) segmentation model, which incorporates the strengths of both UNet (Ronneberger et al., 2015) and PHNN (Harrison et al., 2017) by using deep supervision to progressively propagate high-level semantic features to lower-level, but higher resolution features. The PSNN achieves superior performance in the tumor segmentation task as compared to prior arts, e.g., DenseUNet (Yousefi et al., 2018) and PHNN (Harrison et al., 2017).
- 3.
For the CTV segmentation, we introduce a novel spatial context encoded deep CTV delineation framework. Instead of expecting the CNN to learn distance-based margins from the GTV, LNs, and OARs binary masks, we provide the CTV delineation network with the 3D signed distance transform maps (SDMs) of these structures. Specifically, we include the SDMs of the GTV, LNs, lung, heart and spinal canal with the original RTCT volume as inputs to the network. From a clinical perspective, this allows the CNN to emulate the oncologists manual delineation, which uses the distances of GTV, LNs vs. the OARs as a key constraint in determine the CTV boundaries.
- 4.
We demonstrate in the extensive experiments that both our GTV and CTV segmentation methods can significantly improve the performance over prior state-of-the-art (SOTA): 8.7% increase in absolute Dice score (DSC) (from 70.3% to 79.0%) and 8.5 mm reduction in average surface distance (ASD) (from 14.2 mm to 5.7 mm) as compared to Yousefi et al. (2018) for GTV segmentation, and 3.4% increase in DSC (from 79.2% to 82.6%) and 3.3 mm reduction in ASD (from 7.7 mm to 4.4 mm) as compared to Cardenas et al. (2018a) for CTV segmentation.
The initial results of this work were presented in two conference papers each focusing on separately GTV Jin et al. (2019a) and CTV Jin et al. (2019b) segmentation. The current manuscript extends the two previous works in three aspects.
- 1.
We provide a more comprehensive literature review for the GTV and CTV segmentation works, as well as for PET/CT co-segmentations, which could not be included in our conference papers because of page limits.
- 2.
We expand our esophageal RT dataset to 148 patients with paired PET/CT and RTCT images from the original 110. We conduct extensive 4-fold cross-validation for both GTV and CTV segmentation using the same splits at the patient level. For the GTV experiment, we additionally compared against three recent SOTA PET/CT co-segmentation methods (Zhong, Kim, et al., 2019, Zhao, Li, et al., 2019, Kumar, Fulham, Feng, Kim, 2020).
- 3.
We integrate the GTV and CTV segmentation together, reporting, for the first time, a combined and more complete esophageal target contouring workflow. In doing so, we study the impact of using our GTV predictions as input into the CTV task, characterizing the performance of this integrated workflow and demonstrating its potential clinical value.
Section snippets
GTV segmentation
A handful of studies have addressed automated esophageal GTV segmentation (Hao, Liu, Liu, 2017, Tan, Li, Choi, et al., 2017, Yousefi et al., 2018). Tan et al. (2017) developed an adaptive region-growing algorithm with a maximum curvature strategy to segment the esophageal tumor in PET alone and evaluated using phantom images. However, due to the misalignments between PET and RTCT and their different imaging principles, even a dedicated cross-modality registration algorithm may not achieves
Methods
The overall workflow of our DeepTarget system is depicted in Fig. 4, which consists of three major components: (1) image preprocessing to register PET to RTCT and to perform prerequisite anatomy segmentation in RTCT, i.e., the involved LN and OAR segmentation; (2) GTV segmentation using a two-stream chained 3D deep fusion method and the new proposed progressive semantically-nested network (PSNN); (3) CTV segmentation using a deep contextual- and appearance-based method, involving the RTCT and
Dataset and evaluation
Dataset: To evaluate performance, we collected a dataset containing 148 esophageal cancer patients from Chang Gung Memorial Hospital, whose demographic, clinical and tumor characteristics are shown in Table 1. Each patient has a diagnostic PET/CT pair and a treatment RTCT scan and underwent the concurrent chemoradiatioan therapy (CCRT). To the best of our knowledge, this is the largest dataset for esophageal GTV and CTV segmentation to date. All 3D GTV, CTV, and the involved LN ground truth
Esophageal GTV segmentation
Effectiveness of PSNN: The quantitative results and comparisons are tabulated in Table 2 and Fig. 9. When all network models are trained and evaluated using only RTCT stream, our proposed PSNN evidently outperforms the previous best esophageal GTV segmentation method, i.e., DenseUNet (Yousefi et al., 2018). As can be seen, PSNN consistently improves upon in all metrics: with an absolute increase of in DSC (from 0.703 to 0.751) and significantly dropping in distance metrics of HD (from
Conclusions
This work presented a complete workflow for esophageal GTV and CTV segmentation. First, we proposed a two-stream chained 3D deep network fusion method to segment esophageal GTVs using PET and RTCT imaging modalities. This two-stream fusion outperforms prior art, including leading co-segmentation alternatives. We also introduced the PSNN model as a new 3D segmentation architecture that uses a simple, parameter-less, and deeply-supervised CNN decoding path, and demonstrated its superior in the
CRediT authorship contribution statement
Dakai Jin: Conceptualization, Data curation, Formal analysis, Methodology, Software, Writing - original draft. Dazhou Guo: Conceptualization, Formal analysis, Methodology, Software, Writing - original draft. Tsung-Ying Ho: Conceptualization, Data curation, Formal analysis, Writing - review & editing, Supervision. Adam P. Harrison: Methodology, Formal analysis, Writing - original draft. Jing Xiao: Formal analysis, Writing - review & editing. Chen-kan Tseng: Conceptualization, Writing - review &
Declaration of Competing Interest
We have no conflicts of interest to disclose.
Acknowledgements
This work was partially supported by the Maintenance Project of the Center for Artificial Intelligence in Medicine (Grant CLRPG3H0012, SMRPG3I0011) at Chang Gung Memorial Hospital.
References (56)
- et al.
Deep learning algorithm for auto-delineation of high-risk oropharyngeal clinical target volumes with built-in dice similarity coefficient parameter optimization function
Int. J. Radiat. Oncol.* Biol.* Phys.
(2018) - et al.
Variability of clinical target volume delineation for definitive radiotherapy in cervix cancer
Radiother. Oncol.
(2015) - et al.
A prospective study to evaluate the impact of FDG-PET on CT-based radiotherapy treatment planning for oesophageal cancer
Radiother. Oncol.
(2006) - et al.
Fully automatic and robust segmentation of the clinical target volume for radiotherapy of breast cancer using big data and deep learning
Phys. Med.
(2018) - et al.
A systematic review on the role of FDG-PET/CT in tumour delineation and radiotherapy planning in patients with esophageal cancer
Radiother. Oncol.
(2010) - et al.
Gross tumour delineation on computed tomography and positron emission tomography-computed tomography in oesophageal cancer: anationwide study
Clin. Transl. Radiat. Oncol.
(2019) - et al.
Oesophageal carcinoma
Lancet
(2013) - et al.
Variability of target volume delineation in cervical esophageal cancer
Int. J. Radiat. Oncol.* Biol.* Phys.
(1998) - et al.
Improving observer variability in target delineation for gastro-oesophageal cancerthe role of 18ffluoro-2-deoxy-d-glucose positron emission tomography/computed tomography
Clin. Oncol.
(2008) - et al.
Comparing deep learning-based auto-segmentation of organs at risk and clinical target volumes to expert inter-observer variability in radiotherapy planning
Radiother. Oncol.
(2020)
Auto-segmentation of low-risk clinical target volume for head and neck radiation therapy
Pract. Radiat. Oncol.
Automatic detection and segmentation of lymph nodes from ct data
IEEE Trans. Med. Imaging
Fast approximate energy minimization via graph cuts
IEEE Trans. Pattern Anal. Mach. Intell.
Global cancer statistics 2018: globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries
CA: A Cancer Journal Clinicians
Defining the tumour and target volumes for radiotherapy
Cancer Imaging
Auto-delineation of oropharyngeal clinical target volumes using 3D convolutional neural networks
Phys. Med. Biol.
Lymph node gross tumor volume detection in oncology imaging via relationship learning using graph neural network
MICCAI
3D U-Net: learning dense volumetric segmentation from sparse annotation
MICCAI
Pathological pulmonary lobe segmentation from ct images using progressive holistically nested neural networks and random walker
Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support
Organ at risk segmentation for head and neck cancer using stratified learning and neural architecture search
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Gross tumor volume segmentation for head and neck cancer radiotherapy using deep dense multi-modality network
Phys. Med. Biol.
Esophagus tumor segmentation using fully convolutional neural network and graph cut
Intelligent Systems
Progressive and multi-path holistically nested neural networks for pathological lung segmentation from CT images
MICCAI
Deep residual learning for image recognition
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Fully automated delineation of gross tumor volume for head and neck cancer on PET-CT using deep learning: a dual-center study
Contrast Media Mol. Imaging
Densely connected convolutional networks
IEEE CVPR
Accurate esophageal gross tumor volume segmentation in PET/CT using two-stream chained 3D deep network fusion
MICCAI
Deep esophageal clinical target volume delineation using encoded 3D spatial context of tumor, lymph nodes, and organs at risk
MICCAI
Cited by (51)
An overview of artificial intelligence in medical physics and radiation oncology
2023, Journal of the National Cancer CenterA Review of the Metrics Used to Assess Auto-Contouring Systems in Radiotherapy
2023, Clinical OncologyAutomatic segmentation of esophageal gross tumor volume in <sup>18</sup>F-FDG PET/CT images via GloD-LoATUNet
2023, Computer Methods and Programs in BiomedicineCitation Excerpt :Once the GTV is defined, the clinical and planning target volumes will be successively determined by considering the surrounding sub-clinical disease spread and organs at risk [3]. Different radiotherapy doses will then be delivered to the gross tumor and clinical target volumes to kill the cancer cells [3,4]. Thus the precise delineation of esophageal GTV contributes to the maximum curative effect, while simultaneously preventing excessive dose exposure to the adjacent normal tissues.
Towards automated organs at risk and target volumes contouring: Defining precision radiation therapy in the modern era
2022, Journal of the National Cancer CenterCitation Excerpt :Recent works have developed deep learning models to segment the esophageal GTV.24–27 Notably, Jin et al.26,27 introduced a two-streamed deep learning framework to segment esophageal GTV, which has the flexibility to segment the GTV using only simCT or combining simCT with diagnostic PET/CT when available (Fig. 3). The simCT stream is trained using the simCT input alone, while the other stream uses an early fusion module of simCT and registered PET input channels, followed by a late fusion module to generate the final GTV segmentation via utilizing the complementary information in CT and PET.
Automated Tumor Segmentation in Radiotherapy
2022, Seminars in Radiation OncologyCitation Excerpt :Recent studies have tried combatting the low contrast with the use of PET/CT. Jin et al. provides a thorough analysis of autosegmentation of esophageal tumors using a 2-stream chained deep fusion framework for CT and PET and a progressive semantically-nested network, an approach they call DeepTarget, including comparison to a wide variety of state-of-the-art approaches from other groups.63 With a mean Dice of 0.790 ± 0.095, their technique outperformed DenseUNet, progressive holistically nested neural networks, and several other cited fusion approaches.
- 1
Co-first author.