Keywords

1 Introduction

Prostate cancer is the most frequently diagnosed cancer in American men with 181,000 new cases in 2016 resulting in more than 26,000 deaths [10]. Early diagnosis has resulted in improved long term survival but depends on invasive multicore biopsies done under trans-rectal ultrasound (TRUS) imaging guidance. Recently, multi-parametric magnetic resonance imaging (MRI) has provided promising results as a non-invasive alternative for prostate cancer detection and classification [3].

Two specific tasks are required in the examination of multi-parametric MRI (mpMRI) images. First, cancer regions must be detected and second these suspicious areas must be classified as either benign or otherwise actionable, where biopsy is recommended for further tissue interrogation. This approach could potentially reduce the number of biopsies done overall. The comprehensive assessment of mpMRI, which may consist of 8 or more different volumetric datasets, can be tedious for daily clinical readings. Furthermore, subtle and collective signatures of cancerous lesions expressed within mpMRI are rather difficult to detect consistently even by experienced radiologists. This challenge is augmented in cases of small lesions. Another challenge is also characterizing these lesions and presenting the results as they relate to biopsy findings with Gleason Scores. There has been a number of attempts in providing an automatic solution to quantify these contrast changes and use them to detect and classify suspicious lesions. Chung et al. provides a good overview of the challenges [3]. The majority of proposed methods are based on various quantifiable image features, which are hypothesized to be important for the detection and classification tasks. For example, in [9], level set methods were used to segment the prostate and a set of features were acquired from multiple diffusion-weighted images (DWI). These features were then used in a stacked auto-encoder and finally, through a logistic regression method, classified into two classes of benign and malignant. The result was a 100% correct classification rate based on data from 53 patients. Also, in [3], a deep learning network comprised of stochastically realized receptive fields and ending in fully connected sequencing layers was proposed with a sensitivity of 64.00% and specificity of 82.48% on a dataset from 20 patients. We present a novel approach for prostate cancer classification based on image-to-image networks, inspired by [6]. In this work, multi-parametric images are directly entered to the network and no preprocessing step in terms of feature extraction is required. We evaluate the classification performance on multiple image-to-image architectures and image input channel variations on a 202 patient dataset.

2 Methods

We formulate the task as a multi-object segmentation problem. In this approach, the “segmentation” is in fact the response map. Unlike a binary segmentation, the fractional response peaks at the tumor location and follows a Gaussian distribution in the vicinity. Two independent response channels are considered to accommodate both benign and malignant lesions’ characteristics. This approach has multiple advantages. First, the spatial uncertainty of the marked lesion is inherently considered through the choice of the Gaussian standard deviation. Second, there is no specific need to determine a patch size to interrogate the neighborhood around the lesion. The implementation is a type of encoder-decoder architecture [2]. However, instead of an anticipated binary segmentation output, local maxima within an output response map suggest the tumors’ locations. Additional analysis comparing the intensity of the response maps from different output channels (e.g., benign and malignant) at particular locations is done to further characterize the detected lesion. This architecture naturally allows multiple tumors and multiple classes of tumors to be detected and characterized within a series of multi-parametric input images. Depending on the availability of the ground-truth, one may simply add the tumor boundaries and even extend the approach to not only detect and characterize but also to segment as well. In the following, we make use of 2D, as opposed to 3D convolutional architectures, which have fewer parameters and allows for additional training data with superior results based on our experience.

2.1 Data Preparation

Data have been collected from patients with a suspicion of prostate cancer. Overall, we processed 202 multi-parametric prostate MRI (mpMRI) datasets from the ProstateX challenge database [7]. The patients were all imaged using 3T MRI scanners without an endo-rectal coil. The scan protocol included axial T2-weighted 2D turbo spin-echo images providing anatomical overview of the prostate and the zonal structure. Furthermore, diffusion weighted imaging (DWI), depicting water molecule diffusion variations due to the microscopic changes in tissue structures, was included. DWI is created using different diffusion-weightings (b-values) that depict tissue with increased cellularity and thus restricted diffusion. The apparent diffusion coefficient (ADC) is derived using the signal intensity changes of at least two b-values and provides a quantitative map demonstrating the degree of water molecule diffusion. It is believed that tumors have restricted levels of diffusion and hence appear hypo intensive within the ADC map. Finally, a calculated b-value image at b = 1400 mm\(^2\)/s was extrapolated. Additionally, the data includes dynamic contrast-enhanced (DCE) images. These images consist of a series of T1-weighted acquisitions taken during intravenous gadolinium-based contrast agent injection. It is known that the prostate cancer tissue often induces some level of angiogenesis, which is followed by an increased vascular permeability as compared to normal prostatic tissue. Pharmacokinetic modelling was applied to the DCE-MRI series in order to estimate the K-trans parameter of the Tofts model as an indicator of tissue permeability [12].

For annotation, the lesions’ center locations and corresponding classifications were available [7]. The two-class labels were clinically relevant cancer (Gleason score >6) and non-relevant (Gleason score \(\le \)6). We use a cascaded 3D elastic registration as a first step of preprocessing to compensate for any motion that may have occurred during acquisitions [13]. In order to increase robustness, a pairwise registration between T2-weighted image and the corresponding low-b diffusion image as the representative of DWI set is performed. We then apply the computed deformation to compensate motion in both ADC map and high-b diffusion image. Similarly, we perform a pairwise registration between T2-weighted image and late enhanced contrast image as the representative of DCE set. Additionally, an 80 mm \(\times \) 80 mm region of interest (ROI) mask was applied on each slice to ensure only prostate and surrounding areas were considered. After intra-patient registration, all images are then reformatted into a T2-weighted 100 mm \(\times \) 100 mm \(\times \) 60 mm image grid, which corresponds to roughly 200 \(\times \) 200 pixel 2D slices. Two ground truth maps corresponding to benign and malignant tumor labels were created for each dataset by creating Gaussian distribution (with \(3\sigma \) of 10 mm) at each lesion point in 2D as shown in Fig. 1. The Gaussian distribution was also propagated through-plane with a standard deviation adjusted to the acquisition slice thickness. Only slices containing any tumor labels were selected for processing. This final set totals 824 slices from the 202 patient cases.

Fig. 1.
figure 1

Sample slices from T2 images coupled with marked tumor point locations as Gaussian responses. The green and red regions correspond to benign and malignant tumors, respectively.

2.2 Network Design and Training

We designed three convolutional-deconvolutional image-to-image networks (Models 0, 1, and 2) with increasing complexity in terms of number of features and layers as shown in Fig. 2. Compared to the 13 convolutional and 13 deconvolutional layers of SegNet [2], these models contain fewer layers and features to avoid over-fitting. Each model’s output consists of two channels signifying the malignant and benign tumor categories. Batch normalization was used after each convolutional layer during training. A \(256\times 256\) input image was used. In addition to the three networks, the following modifications were also evaluated:

Fig. 2.
figure 2

The three networks used evaluated in the proposed method.

  • Input images available (T2, ADC, High B-value, K-trans)

  • Activation function

    • Rectified Liner Unit (ReLU)

    • Leaky ReLU (\(\alpha =0.01\))

    • Very Leaky ReLU (\(\alpha =0.3\)) - improved classification performance in [1]

  • Adding skip-connections [4]

  • Training data augmentation (Gaussian noise addition, rotation, shifting)

All networks were trained using Theano [11] with batch gradient decent (Batch size = 10). A mean-squared error loss function computed within a mask of the original image slice size was used. Training was performed for a maximum of 100 epochs and a minimal loss on a small set of validation data was used to select the model. A constant learning rate of 0.005 was used throughout. In order to assess the sampling variability we performed 5-fold cross validation bootstrapped five times with different sets of data chosen randomly for training and testing, hence 20% of the data was used for testing in each fold. Using this approach we are able to get a range of results and can compute rather a sampling independent average performance. As a performance indicator, we use the area under the curve (AUC) for each classification run. We also make sure that no slices from a single patient fall into both training/validation and test datasets. Classification was determined by the intensity ratios from each channel at the given location.

3 Results

Our first aim is to assess performance in lesion characterization. The second aim is to better understand the contribution of the different mpMRI contrasts in the overall characterization performance. It is desirable to have a compromise between the acquisition length (smaller number of channels) and the performance. The performance results using varying number of multi-parametric channels are shown in Table 1 and plotted in Fig. 3. It is clear that the aggregate of all modalities produced the best result across all models. However, it is clinically desirable to eliminate the dynamic sequence scan, both to save time and contrast agent injection. The performance in this case may still provide a clinically acceptable negative predictive value (NPV) to rule out malignant lesions and avoid invasive biopsies (by selecting an appropriate operating point on the ROC curve). This hypothesis must be further investigated and validated. Model 1 produces the best average AUC with the least variability while Model 0 has an optimal single AUC score among all the folds tested. Sample AUCs are shown in Figs. 4 and 5.

Table 1. Average AUC results of the three networks used with different combinations of input channels without data augmentation.
Fig. 3.
figure 3

AUC results of the three networks with a differing input modalities.

Fig. 4.
figure 4

ROC Curve of Model 1 using all four MRI modalities in our dataset.

Fig. 5.
figure 5

ROC Curve of Model 1 using skip connections and training data augmentation.

Results based on all four input channels with variations of adding skip connects or changing the response function are shown in Table 2 and Fig. 3. Using the leaky and very leaky ReLUs resulted in inferior performance compared to ReLUs. However, skip connections resulted in improved performance for the most complex model with an average AUC of 83.3% and reduced variability across folds.

Table 2. Average AUC results of the three networks used with architecture changes.
Fig. 6.
figure 6

Average AUC results of the three networks with skip connects and modified transfer functions. In this case, the leaky and very leaky ReLus had alpha parameters of 0.01 and 0.3, respectively.

Training data augmentation by translation and rotation coupled with Gaussian noise resulted in a consistent improvement. An average AUC of 95% was reached, when we applied the data augmentation along with skip connections to model M1. We also coupled the image to image localization and classification network with a discriminator that aims to identify real and generated probability maps. The resulting network drives the evolution of the image to image localization and classification network by the weighted sum of the regression cost and binary classification cost stemming from the use of the discriminator. The training is conducted using the approaches recently provided in the generative adversarial networks (GAN) literature [5, 8]. In our limited experiments, we found out that this adversarial setup yielded similar performance compared to the image to image network. Use of different adversarial approaches is a part of our future directions (Fig. 6).

4 Conclusions and Future Work

We have presented a convolutional image-to-image deep learning pipeline for performing classification without fully connected layers as in conventional classification pipelines. The same network could also be used for localization of suspicious regions by examining the responses across different channels. We have experimented and shown results by varying input channels and network parameters to arrive at a recommendation architecture for an optimal performance. An average AUC of 83.4% for classification without data augmentation is promising and improvements are possible, for instance, by inclusion of a prostate segmentation region. This will allow the network to focus solely on regions within the prostate and not get penalized for responses outside this region. We also plan to develop and evaluate localization of tumors by the individual channel responses.

Although optimal classification was achieved using four input images, in a practice it is undesirable to inject patients with contrast to obtain K-trans and DCE images. Therefore we hypothesize that methods developed without use of K-trans or DCE images could find more utility in early diagnosis scenario as a gatekeeper of a more invasive biopsy approach.