1 Introduction

It is estimated that in the United States there are more than 50,000 new cases of pancreatic cancer occur each year, which accounts for about 3% of all cancers in the US and about 7% of all cancer deaths [6]. Pancreatic ductal adenocarcinoma (PDAC) is the most common type of pancreatic cancer, making up more than 80% of cases. It originates in cells lining small tubes in the pancreas called ducts, which transport juices containing important digestive enzymes into the small intestine [1]. Despite substantial research efforts and gradual diagnostic and therapeutic improvements, pancreatic ductal adenocarcinoma is still highly lethal with five-year survival rate just around 5–7% and one-year survival is achieved in less than 20% of cases, reflecting the aggressive metastatic spread of PDAC and the lack of effective and reliable biomarkers for early diagnosis [2]. Therefore, there is an important need for development of novel and effective strategies for PDAC early detection and therapy.

Molecular techniques and imaging techniques have been developed for the diagnose and prognosis. Recent studies have utilized deep and targeted genomic analysis techniques to study pancreatic cancer [7]. These molecular techniques have been successful in characterizing gene-expression signatures in cancer cell and the molecular landscape of PDAC. But molecular techniques may not be able to sufficiently capture the heterogeneous nature of PDAC in vivo. While imaging techniques, especially computed tomography (CT) scan accompanied by three-dimensional (3D) reconstruction, are widely used for preoperative staging of PDAC [8, 9]. Imaging techniques can provide quantitative phenotypic features to reveal the characteristics of tumors. Other clinical variables can also serve as reliable prognostic biomarkers. For example, CA19-9 levels evaluation is recommended to correctly stage PDAC, assess the response to therapy [3].

Prognostic models for the prediction of overall survival of PDAC is of vital importance because it can serve to predict the biological aggressiveness of this cancer and evaluate the patient outcome. Survival analysis, originated from clinical research, establishes a connection between covariates and the time of an event. It differs from traditional regression by the fact that parts of the training data can only be partially observed due to patient’s dropout and study’s limited time. By fitting the survival function and hazard function using covariates in a regression models, we can predict the prognosis of patients.

In this paper, we developed a prognostic model to predict overall survival based on predictors derived from contrast-enhanced pancreas CT scans and patient’s clinical variables.

2 Method

2.1 Image Acquisition

The image dataset used in paper comes from the MICCAI 2018 computational precision medicine challenge, which focuses on the quantitative assessment of pancreas cancer. This dataset consists of a consecutive series of 212 patients undergoing pancreas resection at Memorial Sloan Kettering Cancer Center with clinical variables and high-quality annotated CT imaging. In which 159 portal venous phase CT images of the pancreas parenchyma and tumor in MetaImage (MHD) format are provided as a “training set”. These images are fully anonymized, with expert radiologist segmented tumors and liver parenchyma to eliminate segmentation-related uncertainty. In addition to the image data, the training set includes a data dictionary describing the clinical variables and a list of the variables for each patient, including preoperative CA 19-9 and neoadjuvant chemotherapy.

2.2 Data Processing

Since gray values of CT scans reflect absolute world values (HU) and should be comparable between scanners, we don’t perform intensity normalization on the CT images. For the missing entries in preoperative CA 19-9, we fill the holes with the median value.

2.3 Image Analysis

We extraction of radiomics features from medical imaging to serve as events/covariates for survival analysis. For each CT scan, we extract features in three categories: first order statistics, shape-based features and texture features. In addition, we applied Laplacian of Gaussian and Wavelet filter to images, then extract features based on the filtered image. Each feature is extract from two kinds of region of interest (ROI), the first one is segmented tumor region by experts; the another one is drawn to encompass all pancreas region based on intensity values.

We also applied feature extraction on Wavelet filtered and Laplacian of Gaussian filtered images in addition to the original images. The Wavelet filtering yields 8 decompositions per level, which include all possible combinations of applying either a High or a Low pass filter in each of the three dimensions. The LoG filter enhances the edge by emphasizing areas of gray level change. Sigma value in LoG filter defines how coarse the emphasized texture should be. We extracted features from LoG filtered images with sigma value equals 1.0, 2.0, 3.0, 4.0, 5.0 respectively.

In total, we extracted 2436 features from each CT image. The features were extracted using custom code and Pyradiomics toolbox [4].

First Order Statistics. We extracted 19 features for first order statistics, which are described as follows.

Basic Features. For basic first order statistics, we extract the maximum intensity, minimum intensity, mean, median, 10th percentile, 90th percentile, standard deviation, variance of intensity value in ROI.

Energy. Energy is a measure of the magnitude of voxel values in an image. A larger values implies a greater sum of the squares of these values.

$$\begin{aligned} \textit{energy} = \displaystyle \sum ^{N_p}_{i=1}{(\mathbf X (i) )^2} \end{aligned}$$
(1)

Total Energy. Total Energy is the value of Energy feature scaled by the volume of the voxel.

$$\begin{aligned} \textit{total energy} = V_{voxel}\displaystyle \sum ^{N_p}_{i=1}{(\mathbf X (i) + c)^2} \end{aligned}$$
(2)

Entropy. Entropy specifies the uncertainty/randomness in the image values. It measures the average amount of information required to encode the image values.

$$\begin{aligned} \textit{entropy} = -\displaystyle \sum ^{N_g}_{i=1}{p(i)\log _2\big (p(i)+\epsilon \big )} \end{aligned}$$
(3)

where \(\epsilon \) is an arbitrarily small positive number.

Interquartile Range

$$\begin{aligned} \textit{interquartile range} = \mathbf P _{75} - \mathbf P _{25} \end{aligned}$$
(4)

where \(\mathbf P _{25}\) and \(\mathbf P _{75}\) are the \(25^{th}\) and \(75^{th}\) percentile of the gray level array, respectively.

Range

$$\begin{aligned} \textit{range} = \max (\mathbf X ) - \min (\mathbf X ) \end{aligned}$$
(5)

Mean Absolute Deviation (MAD)

$$\begin{aligned} \textit{MAD} = \frac{1}{N_p}\displaystyle \sum ^{N_p}_{i=1}{|\mathbf X (i)-\bar{X}|} \end{aligned}$$
(6)

Mean Absolute Deviation is the mean distance of all intensity values from the Mean Value of the image array.

Robust Mean Absolute Deviation (rMAD)

$$\begin{aligned} \textit{rMAD} = \frac{1}{N_{10-90}}\displaystyle \sum ^{N_{10-90}}_{i=1} {|\mathbf X _{10-90}(i)-\bar{X}_{10-90}|} \end{aligned}$$
(7)

Robust Mean Absolute Deviation is the mean distance of all intensity values from the Mean Value calculated on the subset of image array with gray levels in between, or equal to the \(10^{th}\) and \(90^{th}\) percentile.

Root Mean Squared (RMS)

$$\begin{aligned} \textit{RMS} = \sqrt{\frac{1}{N_p}\sum ^{N_p}_{i=1}{(\mathbf X (i) + c)^2}} \end{aligned}$$
(8)

RMS is the square-root of the mean of all the squared intensity values. It is another measure of the magnitude of the image values. This feature is volume-confounded, a larger value of c increases the effect of volume-confounding.

Skewness

$$\begin{aligned} \textit{skewness} = \displaystyle \frac{\mu _3}{\sigma ^3} = \frac{\frac{1}{N_p}\sum ^{N_p}_{i=1}{(\mathbf X (i)-\bar{X})^3}}{\left( \sqrt{\frac{1}{N_p}\sum ^{N_p}_{i=1}{(\mathbf X (i)-\bar{X})^2}}\right) ^3} \end{aligned}$$
(9)

Skewness measures the asymmetry of the distribution of values about the Mean value. Depending on where the tail is elongated and the mass of the distribution is concentrated, this value can be positive or negative.

Kurtosis

$$\begin{aligned} \textit{kurtosis} = \displaystyle \frac{\mu _4}{\sigma ^4} = \frac{\frac{1}{N_p}\sum ^{N_p}_{i=1}{(\mathbf X (i)-\bar{X})^4}}{\left( \frac{1}{N_p}\sum ^{N_p}_{i=1}{(\mathbf X (i)-\bar{X}})^2\right) ^2} \end{aligned}$$
(10)

Kurtosis is a measure of the ‘peakedness’ of the distribution of values in the image ROI. A higher kurtosis implies that the mass of the distribution is concentrated towards the tail(s) rather than towards the mean. A lower kurtosis implies the reverse: that the mass of the distribution is concentrated towards a spike near the Mean value.

Uniformity

$$\begin{aligned} \textit{uniformity} = \displaystyle \sum ^{N_g}_{i=1}{p(i)^2} \end{aligned}$$
(11)

Uniformity is a measure of the sum of the squares of each intensity value. This is a measure of the homogeneity of the image array, where a greater uniformity implies a greater homogeneity or a smaller range of discrete intensity values.

2.4 Shape Features

We extracted 14 features to characterize the shape of ROI, which are described as follows.

Basic Features. For basic shape features, we extract volume, surface area, surface area to volume ratio, maximum 3D diameter, maximum 2D diameter for axial, coronal and sagittal plane respectively, major axis length, minor axis length and least axis length in ROI.

Sphericity

$$\begin{aligned} \textit{sphericity} = \frac{\root 3 \of {36 \pi V^2}}{A} \end{aligned}$$
(12)

Sphericity is a measure of the roundness of the shape of the tumor region relative to a sphere. It is a dimensionless measure, independent of scale and orientation. The value of 1 indicates a perfect sphere.

Spherical Disproportion

$$\begin{aligned} \textit{spherical disproportion} = \frac{A}{4\pi R^2} = \frac{A}{\root 3 \of {36 \pi V^2}} \end{aligned}$$
(13)

where R is the radius of a sphere with the same volume as the tumor, and equal to \(\root 3 \of {\frac{3V}{4\pi }}\).

Elongation

$$\begin{aligned} \textit{elongation} = \sqrt{\frac{\lambda _{\text {minor}}}{\lambda _{\text {major}}}} \end{aligned}$$
(14)

where \(\lambda _{\text {major}}\) and \(\lambda _{\text {minor}}\) are the lengths of the largest and second largest principal component axes.

Flatness

$$\begin{aligned} \textit{flatness} = \sqrt{\frac{\lambda _{\text {least}}}{\lambda _{\text {major}}}} \end{aligned}$$
(15)

where \(\lambda _{\text {major}}\) and \(\lambda _{\text {least}}\) are the lengths of the largest and smallest principal component axes.

2.5 Texture Features

We also extracted texture features using the standard pipeline of Pyradiomics toolbox, including 22 grey level co-occurrence matrix (GLCM) features, 16 gray level run length matrix (GLRLM) features, 16 Grey level size zone matrix (GLSZM) features, five neigbouring gray tone difference matrix (NGTDM) features and 14 gray level dependence matrix (GLDM) Features. Due to the page limitation we don’t present the full list here, it can be found at https://pyradiomics.readthedocs.io/en/latest/features.html#module-radiomics.glcm

2.6 Survival Analysis

Feature Selection. Since the number of covariates is large relative to the number of observations, we performed feature selection remove the redundant or irrelevant features we extracted, as well as to reduce overfitting in model. More specifically, we fit a Cox model to each variable individually and record the c-index on the training set.

Next, we want to build a parsimonious model by excluding irrelevant features. We use the top ranking features as described above. By experimenting with five-fold cross-validation on the training set, we determined the number of features to use.

Survival Prediction. In order to perform accurate and robust prediction of the overall survival predication of patients’ survival, we utilize gradient boosting with component-wise Cox’s proportional hazards model as base learner [5].

We denotes \(D_i\) as the time from disease onset to death, and \(C_i\) as the potential time for patient i, \(i=1,\cdots ,n\). Thus the observed survival time is \(T_{i}=\min \{ D_{i},C_{j}\}\), and the death indicator is given by \(\delta _{i}=I(D_{i} \le C_{i}) \). Let \(X_{i}=(X_{i1},\cdots ,X_{ip})^{T}\) be a p-dimensional covariate vector which contains all features selected for the ith patient. We assume that, conditional on \(X_i\), \(D_i\) is independently censored by \(C_i\). Then the death hazard can be modeled as

$$\begin{aligned} {\lambda _i}(t|{X_i}) = \mathop {\lim }\limits _{dt \rightarrow 0} \frac{1}{{dt}}\Pr (t \le {D_i} < t + dt|{D_i} \ge t,{X_i}) = {\lambda _0}(t)\exp (X_i^T\beta ) \end{aligned}$$
(16)

where \(\lambda _{0}(t)\) denotes the baseline Coxs proportional hazards function. \(\beta = (\beta _{1},\cdots ,\beta _{p})\) is the vector of parameters. The corresponding log-partial likelihood is given by

$$\begin{aligned} (\beta ) = \sum \limits _{i = 1}^n {{\delta _i}\left[ {X_i^T\beta - \log \{ \sum \limits _{\ell \in {R_i}} {\exp (X_\ell ^T\beta )} \} } \right] } \end{aligned}$$
(17)

The idea of gradient boosting is to pursue iterative steepest ascent of the log likelihood function. At each step, given the current estimate of \(\beta \), say \({\hat{\beta }}\), let \(\hat{\eta }= X^{T}\hat{\beta }\). The algorithm computes the gradient of the log-partial likelihood with respect to \(\eta _{i}\), the ith component of \(\eta \),

$$\begin{aligned} {U_i} = \frac{\delta }{{\delta {\eta _i}}}{l_n}(\eta ){|_{\eta = \hat{\eta }}} = {\delta _i} - \sum \limits _{\ell = 1}^n {\frac{{{\delta _\ell }I({T_i} \ge {T_\ell })\exp ({{\hat{\eta }}_i})}}{{\sum \limits _{k = 1}^n {I({T_k} \ge {T_\ell })\exp ({{\hat{\eta }}_k})} }}} \end{aligned}$$
(18)

A component-wise algorithm can be implemented by restricting the search direction to be component-wise. For example, fit component-wise model

$$\begin{aligned} {{\tilde{\beta }}_j} = \mathop {\arg \min }\limits _{{\beta _j}} \frac{1}{n}\sum \limits _{i = 1}^n {{{({U_i} - {X_{ij}}{{\tilde{\beta }}_j})}^2}} \end{aligned}$$
(19)

for \(j=1,\cdots ,p\), compute

$$\begin{aligned} {j^*} = \mathop {\arg \min }\limits _{1 \le j \le p} \frac{1}{n}\sum \limits _{i = 1}^n {{{({U_i} - {X_{ij}}{{\tilde{\beta }}_j})}^2}} \end{aligned}$$
(20)

and update \({{\hat{\beta }}_{j*}} = {{\hat{\beta }}_{j*}} + v{{\tilde{\beta }}_{j*}}\) until stop criteria is reaches. Here v denotes the learning rate. In our model, we set learning rate to 0.1 and number of boosting stages to 100.

3 Result

Based on the methods described above, we performed feature extraction, feature selection and survival prediction. For feature extraction, we extracted 2438 features per case, including 2436 image features and 2 clinical covariates. For feature selection, we fitted a Cox model to each variable individually and record the c-index on the training set, and used five-fold cross-validation on the training set to the number of features to use. In the end, we selected the following 16 features (Table 1):

Table 1. Selected most predicative features

Finally, we evaluate the performance of our model by predicting overall survival on training set based on predictors and calculated the concordance index. The experiments were done in five-fold cross-validation. We achieves the mean concordance index of 0.7016.

4 Discussion

In this paper, we developed a prognostic model to predict overall survival based on predictors derived from contrast-enhanced pancreas CT scans and patient clinical variables. We extracted radiomics features from CT scans, and fitted a Cox model to each variable individually to select the most predictive features. Finally, we use gradient boosting with component-wise Cox’s proportional hazards model to predict the overall survival of patients. Our model achieves mean concordance index of 0.7016 using five-fold validation on training set. In the future, we will explore more possibilities of feature selection and survival predication methods to improve the performance of our model.