1 Introduction

Accurate assessment of the renal transplant function is of great importance for graft survival [1]. Although transplantation can improve a patient’s wellbeing, there is a potential post-transplantation risk of kidney dysfunction that, if not treated in a timely manner, can lead to the loss of the entire graft, and even patients death [2]. Thus, accurate assessment of renal transplant function is crucial for the identification of proper treatment. Traditional evaluation of renal transplant function is based on blood tests and urine sampling, e.g., plasma creatinine and creatinine clearance. However, a significant change in creatinine levels is only observable after \(60\,\%\) loss of the kidney function due to the low sensitivity of indices [3]. Biopsy remains the gold standard, yet only as the last resort because of its invasiveness, high cost, and potential morbidity.

On the other hand, evaluation of renal transplant functions has been investigated in several studies using different imaging modalities, which are favorable as they provide information on each kidney separately. A quick overview of these imaging techniques is provided below and the reader is referred to [4] for more details about renal function assessment using diagnostic imaging. The most frequently used imaging technique, scintigraphy, has been clinically explored for its good functional information (e.g., [5]). However, radionuclide approaches indexing radioisotope activity versus time (renograms) involve radiation exposure, thereby limiting the applicability of these techniques. Ultrasound (US) imaging with Doppler interrogation was explored for renal assessment (e.g., [6, 7]). However, US imaging suffers from low signal-to-noise ratios, shadowing artifacts, and speckles that greatly decrease image quality and diagnostic confidence. Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) has been also exploited due to its ability to provide both anatomical and functional kidney information (e.g., [8, 9]). However, dynamic MRI imaging involves contrast agents which may implicate nephrogenic systemic fibrosis [9]. BOLD-MRI, another imaging technique, has been also utilized to study renal function, using the amount of oxygen diffused blood to examine the proper functionality of the kidney (e.g., [1, 10]). Furthermore, DW-MRI has emerged as an imaging technology for renal transplant assessment based on the measurement of water molecules inside soft tissue (e.g., [1012]). Several studies have utilized DW-MRI for functional renal assessment by measuring the cortical and/or medullary apparent diffusion coefficient (ADC), but the results have varied [1]. Also, several of the DW-MRI studies performed only a statistical analysis to investigate the significant difference between pairs at certain b-values.

In addition to the previous limitations, none of the aforementioned studies integrates both image– and laboratory–based biomarkers. Moreover, image analysis in most of them employ manual delineations of the kidney and do not account for motion effects. To account for these challenges, we propose a computer-aided diagnostic (CAD) system, which utilizes the fusion between DW-MRI-derived biomarkers and clinical biomarkers for the assessment of transplant status. Details of the proposed framework are described in the following section.

2 Methods

In order to build a robust diagnostic framework, the proposed CAD system integrates both clinical and image-based biomarkers. The former are based on creatinine clearance and serum plasma creatinine laboratory measurement. In order to extract the image-based biomarkers, our CAD system performs the following DW-MRI processing steps: (i) motion correction of the DW-MRI data using nonrigid registration; (ii) kidney segmentation from surrounding abdominal structures using level sets; (iii) ADC estimation to construct the discriminatory features (i.e., CDFs–cumulative distribution functions) for transplant status classification; Then, both image and clinical biomarkers are fused together for the assessment of transplant status by cascading two-stages of stacked non-negativity constrained auto-encoders. The first stage distinguishes non-rejection from abnormal transplants and the second stage classifies these abnormal transplants as rejection or other kidney diseases.

2.1 3D Kidney Segmentation

The first step of our CAD is to segment the kidney tissue from 4D DW-MRI data. To achieve a more accurate segmentation, we initially applied an intensity histogram equalization and the nonparametric bias correction method proposed in [13] to reduce noise effects and DW-MRI heterogeneity. Then, a 3D B-splines based transformation is applied to handle kidney motions using the sum of squared difference similarity metric. Finally, the 3D geometric (level-set based) deformable model approach proposed in [14] is applied to extract the kidney from the co-aligned DW-MRI data. The evolving boundary is controlled by a voxel-wise stochastic speed function that combines an adaptive kidney shape prior and visual appearances of DW-MRI data (image intensities and spatial interactions) into a joint kidney-background Markov-Gibbs random field model. For more details about our segmentation method, please see [14].

2.2 Diffusion Parameters Estimation

Following the segmentation of the kidney, the next step is to estimate diffusion parameters that can be used to assess renal transplant status and to differentiate renal rejection from other diagnostic possibilities (e.g., ATN, acute tubular injury, graft amyloidosis, and tubular inflammation). In this paper, we used the apparent diffusion coefficient (ADC) for global transplant status assessment: \({\mathrm {ADC}_\mathbf {p} = \frac{1}{b_0-b}\ln \left( \frac{g_{b:\mathbf {p}}}{g_{0:\mathbf {p}}}\right) }\); where \(\mathbf {p}=(x,y,z)\) denotes a voxel at position with discrete Cartesian coordinates (xyz), and the segmented DW-MR images \(\mathbf {g}_{0}\) and \(\mathbf {g}_{b}\) were acquired with the \(b_{0}\) and a given different b-value (total of 11 b-values in our study), respectively.

Conventional classification methods dealing directly with voxel-wise ADCs of the entire kidney volume as discriminative features encounter two difficulties: (i) varying input data size requires either data truncation for large kidney volumes or zero padding for small ones and (ii) large data volumes lead to considerable time expenditures for training and classification. In order to overcome these challenges, we characterize the whole 3D ADC maps, collected for each subject at different b-values, by the CDFs of the ADCs, as shown in Fig. 1. These descriptors are independent of the initial data size and can be quantified in accord with the actual accuracy of the ADCs. It is worth mentioning that fixing the input data size to 11 such CDFs helps to overcome the above challenges of the original ADCs’ arbitrary sizes and notably accelerates the classification. We divided the range between the minimum and maximum ADCs for all the input data sets into 100 steps to keep the ADC data well-presented without losing any information. The PDFs and then CDFs of the ADCs were constructed for these quantized values, see Fig. 1.

Fig. 1.
figure 1

Empirical ADC distributions and their CDFs for one subject at different b-values of (\(b_{50}\), \(b_{100}\), \(b_{200}\), and \(b_{300}\)) s/mm\(^2\).

Fig. 2.
figure 2

Block-diagram of an NCAE (a) and SNCAE (b) classifier.

2.3 Autoencoding and Deep Learning-Based Classifier

To classify the transplant status, our CAD employs a deep neural network with a stack of auto-encoders (AE) before the output layer that computes a softmax regression. Each AE compresses its input data to capture the most prominent variations and is built separately by greedy unsupervised pre-training [15]. The softmax output layer facilitates the subsequent supervised back-propagation-based fine tuning by minimizing the total loss (negative log-likelihood) for a given training labeled data. Using the non-negativity constraint AE (NCAE) [16] yields both more reasonable data codes (features) during its unsupervised pre-training and better classification performance after the supervised refinement.

Let \(\mathbf {W}=\{\mathbf {W}^\mathrm {e}_j,\mathbf {W}^\mathrm {d}_i:\,j=1, \ldots ,s;\,i=1,\ldots ,n\}\) denote a set of column vectors of weights for encoding (e) and decoding (d) layers of a single AE in Fig. 2. Let \(\mathsf {T}\) denote vector transposition. The AE converts an n-dimensional column vector \(\mathbf {u} = [u_1,\ldots ,u_n]^\mathsf {T}\) of input signals into an s-dimensional column vector \(\mathbf {h}=[h_1,\ldots ,h_s]^\mathsf {T}\) of hidden codes (features, or activations and \(s\ll n\)) by a uniform nonlinear transformation of s weighted linear combinations of signals:

$$\begin{aligned} { h_j = \sigma \left( \left( \mathbf {W}^{\mathrm {e}}_{j}\right) ^\mathsf {T}\mathbf {u}\right) \equiv \sigma \left( \sum \limits _{i=1}^n w^\mathrm {e}_{j:i}u_i\right) } \end{aligned}$$

where \(\sigma (\ldots )\) is a certain sigmoid. Unsupervised pre-training of the AE minimizes total deviations between each given training input vector \(\mathbf {u}_k\); \(k=1,\ldots ,K\), and the same-dimensional vector, \(\widehat{\mathbf {u}}_{\mathbf {W}:k}\) reconstructed from its code, or activation vector, \(\mathbf {h}_k\) (Fig. 2(a)). The total reconstruction error to compress and decompress the K training input vectors integrates the \(\ell _{2}\)-norms of the deviations:

$$\begin{aligned} { J_{\text {AE}}(\mathbf {W})=\frac{1}{2K} \sum \limits _{k=1}^{K}\parallel \widehat{\mathbf {u}}_{\mathbf {W}:k}-\mathbf {u}_k\parallel ^{2} } \end{aligned}$$
(1)

To reduce the number of negative weights and enforce sparsity of the NCAE, Eq. (1) is appended, respectively, with quadratic negative weight penalties, \(f(w_{i})=\left( \min \{0,w_{i}\}\right) ^2\); \(i=1,\ldots ,n\), and Kullback-Leibler (KL) divergence, \(J_\mathrm {KL}(\mathbf {h}_{\mathbf {W}^{e}};\gamma )\), of activations, \(\mathbf {h}_{\mathbf {W}^\mathrm {e}}\), obtained with the encoding weights \(\mathbf {W}^\mathrm {e}\) for the training data, from a fixed small positive average value, \(\gamma \), near 0:

$$\begin{aligned} { J_{\mathrm {NCAE}}(\mathbf {W}) =J_{\mathrm {AE}}(\mathbf {W}) +\alpha \sum \limits _{j=1}^s\sum _{i=1}^n f(w_{j:i}) +\beta J_{\mathrm {KL}}(\mathbf {h}_{\mathbf {W}^\mathrm {e}};\gamma ) } \end{aligned}$$
(2)

Here, the factors \(\alpha \ge 0\) and \(\beta \ge 0\) specify relative contributions of the non-negativity and sparsity constraints to the overall loss, \(J_{\mathrm {NCAE}}(\mathbf {W})\), and

$$\begin{aligned} {J_{\mathrm {KL}}(\mathbf {h}_{\mathbf {W}^\mathrm {e}},\gamma )= \sum \limits _{j=1}^{s}h_{\mathbf {W}^\mathrm {e}:j} \log \left( \frac{h_{\mathbf {W}^\mathrm {e}:j}}{\gamma }\right) + (1-h_{\mathbf {W}^\mathrm {e}:j})\log \left( \frac{1-h_{\mathbf {W}^ \mathrm {e}:j}}{1-\gamma }\right) } \end{aligned}$$
(3)

The classifier is built by stacking the NCAE layers with an output softmax layer, as shown in Fig. 2(b). Each NCAE is pre-trained separately in the unsupervised mode by using the activation vector of a lower layer as the input to the upper layer. In our case, we considered a two-layer SNCAE in which the bottom NCAE compresses the input vector to \(s_1\) first-level activators, compressed by the next NCAE to \(s_2\) second-level activators, which are reduced in turn by the output softmax layer to \(s^\circ \) values. In our experiments, the SNCAE training parameters used to obtain all the results are shown in Table 1.

Table 1. Two-stage SNCAE classifiers’ training parameters, where n, \(s_1\), \(s_2\), \(s^\circ \), \(\alpha \), \(\beta \), and \(\gamma \) stand for input signals (CDFs) size, \(1^{st}\) NCAE size, \(2^{nd}\) NCAE size, the softmax layer size, weight decay parameter, weight of sparsity penalty term, and desired activation of the hidden units, respectively.

Separate pre-training of the first and second layers by minimizing the loss of Eq. (2) reduces the total reconstruction error, as well as increases sparsity of the extracted activations and numbers of the non-negative weights. The activations of the second NCAE layer, \(\mathbf {h}^{[2]} = \sigma ({\mathbf {W}^\mathrm {e}_{[2]}}^\mathsf {T}\mathbf {h}^{[1]})\), are inputs of the softmax classification layer, as sketched in Fig. 2(b) to compute plausibility of a decision in favor of each particular output class, \(c=1,2\):

$$\begin{aligned} { p(c;\mathbf {W}_{\circ :c}) = \frac{\exp ({\mathbf {W}_{\circ :c}^{\mathsf {T}}\mathbf {h}^{[2]}})}{ \exp (\mathbf {W}_{\circ :1}^{\mathsf {T}}\mathbf {h}^{[2]})+\exp (\mathbf {W}_ {\circ :2}^{\mathsf {T}}\mathbf {h}^{[2]})}; \; c=1,2;\; \sum \limits _{c=1}^2 p(c;\mathbf {W}_{\circ :c};\mathbf {h}^{[2]}) =1. } \end{aligned}$$

Its separate pre-training minimizes the total negative log-likelihood \(J_\circ (\mathbf {W}_\circ )\) of the known training classes, appended with the negative weight penalties:

$$\begin{aligned} { J_\circ \left( \mathbf {W}^{o}\right) =-\frac{1}{K}\sum \limits _{k=1}^K \log p(c_k; \mathbf {W}_{\circ :c}) +\alpha \sum \limits _{c=1}^2\sum \limits _{j=1}^{s_2}w_{\circ :c:j} } \end{aligned}$$
(4)

Finally, the entire stacked NCAE classifier (SNCAE) is fine-tuned on the labeled training data by the conventional error back-propagation through the network and penalizing only the negative weights of the softmax layer.

3 Experimental Results

The proposed CAD system has been tested on DW-MRI data collected from 58 subjects (47 male and 11 female with a mean age of \(26\pm 10\) years) that contains 16 non-rejection transplants and 42 transplant patients with abnormal kidneys, in which 37 cases are early rejection and 5 cases have other kidney diseases. All patients, as a part of post-transplantation routine medical care, were routinely assessed with serum creatinine laboratory values. Coronal DW-MRIs were acquired before any biopsy procedure using a 1.5 \(\text {T}\) scanner (SIGNA Horizon, General Electric Medical Systems, Milwaukee, WI) using a body coil and a gradient multi-shot spin-echo echo-planar sequence (TR/TE: 8000/61.2; bandwidth: 142 kHz; matrix: \(1.25\times 1.25\) mm\(^2\); section thickness: 4 mm; intersection gap: 0 mm; FOV: 32 cm; signals acquired: 7; water signals acquired at different b-values of (\(b_{0}\), \(b_{50}\), and \(b_{100}\)\(b_{1000}\) s/mm\(^2\) with 100 increment). Approximately 50 sections have been obtained in 60–120 s to cover the whole kidney.

Since the segmentation is a key step in developing any CAD system to asses renal function, our segmentation approach was first evaluated on the collected DW-MRI data using the Dice coefficient (DC) [17], percentage volume difference (PVD), and the 95-percentile modified Hausdorff distance (MHD) [18] evaluation metrics. Our method achieved a mean DC, MHD, and PVD of \(0.92\pm 0.02\), \(6.2\pm 2\) mm, and \(15\pm 3\)%, respectively, compared to an MRI expert’s ground truth maps. Also, comparisons between our method and the level-set approach by Chan and Vese (CV) [19] and the like boundary guided by combined intensity and spatial features (I+S) can be found in [14].

Following kidney segmentation, our system assesses the transplant status with a two-stage SNCAE-based classifier using the constructed CDFs and the clinical biomarkers. A leave-one-subject-out (LOSO) approach is applied to distinguish between (i) non-rejection (NR) and abnormal (AB) transplants and (ii) early rejection (ER) from other kidney diseases (OD). First, our system accuracy was evaluated using the clinical biomarkers alone. As demonstrated in Table 2, diagnostic accuracy using the clinical biomarkers only is very low due to the large overlap of these biomarkers between NR and AB groups. Secondly, we evaluated the overall accuracy using the image biomarkers (i.e. CDFs), see Table 2. Classification using image-derived biomarkers has higher accuracy compared to that based on clinical ones. To show the advantages of integrating both clinical and image biomarkers, the kidney status was also assessed using the same LOSO approach and an SNCAE classifier augmented with both biomarkers. As expected, the overall accuracy notably increases after fusion as shown in Table 2.

Table 2. Diagnostic accuracy (ACC), sensitivity (SENS), and specificity (SPEC) for our CAD system with the SNCAE classifier using clinical and image-driven biomarkers. Note that “NR”, “AB”, “ER”, and “OD” stand for non-rejection, abnormal, early rejection and other diagnosis, respectively.
Table 3. Diagnostic accuracy (ACC), sensitivity (SENS), specificity (SPEC), and area under the curve (AUC) for the proposed CAD system with the SNCAE classifier and seven other classifiers from the Weka collection [20].

To evaluate the capabilities of the SNCAE classifier, it has been compared with seven well-known learnable classifiers from the Weka collection [20]: K*, k-nearest neighbor (kNN), Naive Bayes tree (NBT), alternating decision tree (ADT), nearest neighbor with generalization (NNge), Random tree (RT), and Random forest (RF). Table 3 presents their and our diagnostic accuracies in terms of the numbers of correctly classified cases with respect to the overall numbers of subjects. For the first stage (NR vs. AB), our classifier demonstrated the best total diagnostic accuracy of \(95\,\%\), or 15 correctly classified NR transplants out of the 16 subjects, and 40 correctly classified AB transplants out of the 42 subjects. Moreover, for the second stage (ER vs. OD), our classifier achieved a \(95\,\%\) total diagnostic accuracy.

In addition, we tested the performance of the SNCAE compared to the chosen Weka classifiers using the receiver operating characteristics (ROC) [21] analysis. As shown in Table 3, the calculated area under the curve (AUC) is the highest for our classifier and approaches the top-most unit value. These initial diagnostic results confirm that the proposed CAD system holds promise as a reliable non-invasive diagnostic tool.

4 Conclusions

In total, we propose a CAD system for early assessment of renal transplant function from 4D DW-MRI data that combines deformable model segmentation, estimation of spatial diffusion parameters (ADCs), and a SNCAE classification of the transplant status using CDFs of the ADCs and clinical biomarkers as integral status descriptions. In a test on a biopsy-proven cohort of 58 participants, our system showed high diagnostic accuracy to distinguish non-rejection from abnormal transplants as well as to distinguish early rejection from other kidney dysfunction, which make our non-invasive CAD a reliable diagnostic tool. In the future, we intend to increase the test sets of both non-rejection and abnormal transplants in order to further validate the accuracy and robustness of our framework. Also, new kidney transplant data sets, which are acquired at lower b-values, will be used to explore the ability of our framework to determine the type of kidney rejection, i.e. anti-body mediated or T-cell rejection, or other causes of acute kidney dysfunction such as drug toxicity and viral infection.