1 Introduction

Lung cancer has been the leading cause of cancer deaths worldwide. In the year 2018, the estimated death cases of lung cancer will account for approximately 26% of all cancer deaths in the United States [1]. Early diagnosis of lung cancer is crucial in the future treatment of lung cancer patient, because its five-year survival rate is lower than 20% when it promotes to a late stage. Lung cancer usually refers to small malignant lung nodules (with the diameter in the range of 3–30 mm), which can be detected on the chest computed tomography (CT) scans. However, distinguishing the nodules between benign and malignant is quite difficult even for experienced radiologists [2]. Because there are various potential malignancy-related characteristics (e.g., spiculation), these characteristics should be taken into consideration during the diagnosis process.

Computer-aided diagnosis techniques have been proven to be helpful for radiologists in decision making and hold the potential to improve diagnostic accuracy in distinguishing small benign nodules from malignant ones [3]. With the powerful representation capability, deep neural networks are capable of learning more complicated diagnosis patterns from labeled data. Hence, it could assist the automated lung nodule analysis. Recently, several deep learning based methods have been proposed for computer-aided diagnosis of lung nodules. Xie et al. [6] proposed a multi-model ensemble method that considered overall appearance, nodule shape and voxel value of each nodule slice simultaneously to achieve high classification accuracy. Chen et al. [5] introduced a multi-task regression model to explore the internal relationship among the semantic features. Instead of considering these two tasks independently, Hussein et al. [13] proposed a 3D CNN-based multi-task model to implicitly explore the relationship between malignancy classification and attribute score regression tasks. Although achieving state-of-the-art performance, these previous methods either independently or “jointly but implicitly” tackled the benign-malignant classification and attribute score regression tasks, instead of jointly analyzing and explicitly exploring their correlations for more convincing and interpretable diagnosis.

In this paper, we propose a novel Multi-Task deep learning framework with a new Margin Ranking loss (called MTMR-Net) for automated lung nodule analysis. We build a bi-branch model which not only predicts nodule malignancy but also outputs regressed scores of eight attribute characteristics. The relatedness between two highly-correlated tasks is explicitly learned in our model, and both tasks can benefit from each other through the proposed architecture. Furthermore, we propose a novel margin ranking loss based on siamese network architecture to perform comparison while scoring nodules to model their heterogeneity. This enables the network to be more accurate on recognizing marginal lung nodules by referring to lung nodules with different labels but close malignancy scores. We validated our proposed framework on the public LIDC-IDRI dataset and achieved competitive classification accuracy over the state-of-the-arts. In addition, compared with previous approaches which can only output a binary classification result, our proposed model can provide more cues and evidence for radiologists by simultaneously yielding the scores of the attributes when making diagnosis.

2 Method

Our proposed MTMR-Net consists of two components. First, we propose a multi-task deep learning model for nodule analysis, which is composed of lung nodule classification task and attribute score regression task. Second, to further discriminate the marginal nodules, we present a new margin ranking loss to train the model in order to enhance the distinguishing capability among marginal cases.

Fig. 1.
figure 1

Multi-task learning framework. Residual blocks used are exactly the same as the residual blocks in original 50-layer residual network [7]. Besides classification branch, an additional regression branch is added to predict 8 attributes scores. The “CE Loss” and “MSE Loss” denote cross entropy loss and mean square error loss, respectively.

2.1 Multi-task Learning for Lung Nodule Analysis

Benign-Malignant Classification. The multi-task model is fine-tuned from a 50-layer residual network [7]. We keep the feature extraction module of the original residual network. However, in the classification module, we concatenated the extracted feature maps with an additional feature map (feature map from regression module) before the last fully-connected layer, as shown in Fig. 1. We formulate the task as a classification problem rather than a regression problem, considering that a definite diagnosis can provide more intuitive information to experts. Therefore, we use cross entropy loss (CE Loss) for backward propagation in the classification module, which is defined as:

$$\begin{aligned} \mathcal {L}_{cls} = -\frac{1}{N}\sum _i log~p_i^c\left( {y_i^c}|x_i; W_{cls}, W_s\right) , \end{aligned}$$
(1)

where \(x_i\) and \(p_i^c\) are the input image and output probability from the classification module, while \({y_i^c}\in \{0, 1\}\) is the ground truth of lung nodule classification label, \(W_s\) and \(W_{cls}\) are the weights of shared feature extraction path and nodule classification task, respectively. N is the total number of training samples.

Nodule Attribute Score Regression. Motivated by the clinical observation that radiologists analyze the characteristics of attributes for malignancy assessment, we hypothesize that exploring the correlation between malignancy classification and attributes scoring would help to further improve the discrimination capability for lung nodule analysis. Therefore, besides the classification task, we also add a regression module for attributes score prediction in the network. Before the last fully-connected layer for final regression, we explicitly extract attributes features using another fully-connected layer following the shared feature extraction module, as shown in Fig. 1. In addition, rather than using these attributes features solely for regression task, we concatenate the malignant feature in the classification module with the attributes features. The concatenation between malignancy feature map and attributes feature map enables more attributes information guidance in the nodule classification task. For the attributes score regression task, we used mean square error loss (MSE Loss) during the training process, which is defined as:

$$\begin{aligned} \mathcal {L}_{reg} = \frac{1}{N}\sum _{i} || \hat{y_i^r}(x_i;W_s,W_{reg}) - y_i^r ||_2^2, \end{aligned}$$
(2)

where \(y_i^r\in \mathbb {R}^{1\times n}\) is the output of regression task of network, while \(\hat{y_i^r}\in \mathbb {R}^{1\times n}\) is the ground truth of attribute scores. \(n=8\), for using eight semantic attributes.

Fig. 2.
figure 2

Siamese model based on two shared-weight proposed multi-task model. “MR Loss” means margin ranking loss. All 3 modules (feature extraction, classification, regression) are weight-shared in two branches of siamese network.

2.2 Margin Ranking Loss for Discriminating Marginal Nodules

Despite multiple correlated supervision information is employed in our deep neural network, we still observe there exists misclassification on marginal lung nodules. To tackle the similar misclassification problem, Kong et al. [8] used siamese network to enhance model’s discrimination capability on ambiguous cases. Inspired by Kong et al. [8], we perform the same architecture with a novel margin ranking loss while scoring nodules to model nodules’ heterogeneity. Siamese network is well-known for using two shared-weight feature extraction branches in its network architecture. It enables the network to train in a pair-wise mode, see Fig. 2, which can enhance classification accuracy by applying comparison and referring. Besides, a novel margin ranking loss is designed for capturing the ranking relationship between different training samples:

$$\begin{aligned} \mathcal {L}_{rank} = \frac{1}{2N}\sum _{i,j}max\left( 0, \gamma -\delta \left( {p_i^{c}}, {p_j^c}\right) *\left( t_i^c - t_j^c\right) \right) ,\end{aligned}$$
(3)
$$\begin{aligned} \delta \left( {p_i^{c}}, {p_j^c}\right) = \left\{ \begin{array}{lr} 1, &{} {p_i^{c}} \ge {p_j^{c}} \\ -1, &{} {p_i^{c}} < {p_j^{c}} \end{array} \right. , \qquad \qquad \quad \end{aligned}$$
(4)

where \({t_i^c\in [0,1]},{t_j^c\in [0,1]}\) denotes the ground truth malignancy score for the ith, jth training sample, respectively. While \({p_i^{c}\in [0,1]}, {p_j^c\in [0,1]} \) are the ith, jth training sample’s predicted malignancy probability, respectively. \(\delta \left( {p_i^{c}}, {p_j^c}\right) \) is the indicator function. \(\gamma \) is the margin parameter.

If the predicted scores’ ranking is the same as ground truth scores’ ranking (e.g., \({{t_i^c}\ge {t_j^c}}, {{p_i^{c}}\ge {p_j^{c}}}\)), then the loss is 0. Otherwise, the loss is penalized during the training process (e.g., \({{t_i^c}\ge {t_j^c}}, {{p_i^{c}}<{p_j^{c}}}\)). Applying this mechanism into a siamese network can easily explore and model the difference between marginal lung nodules by adjusting the margin parameter \(\gamma \).

2.3 Joint Training of MTMR-Net

In summary, there are three not independent but rather complementary losses for our proposed MTMR-Net. Hence, the total minimization loss is defined as:

$$\begin{aligned} \mathcal {L}_{total} = \mathcal {L}_{cls} + \lambda \mathcal {L}_{reg} + \beta \mathcal {L}_{rank} + \eta ( ||W_{s}||_2^2 + ||W_{cls}||_2^2 + ||W_{reg}||_2^2), \end{aligned}$$
(5)

where \(\lambda \), \(\beta \), \(\eta \) are hyper-parameters balancing \(\mathcal {L}_{cls}\), \(\mathcal {L}_{reg}\) and weight decay term.

In our experiments, Adam optimizer was used for training the entire network. Learning rate was initially set to 3e−3 for the shared feature extraction part and 3e−5 for both classification and regression module. Learning rate also periodically annealed by 0.1. We trained our model for 150 epochs using the pytorch. After using grid-search for finding hyper-parameters, we set 3 parameters for controlling the weights for \(\lambda \), \(\beta \), \(\eta \) as 1, 5e−1, 1e−3, respectively, and the marginal parameter \(\gamma \) was chosen as 1e−1.

3 Experiments

3.1 Dataset and Preprocessing

We validated the proposed MTMR-Net on the LIDC-IDRI dataset, which consisted of 1018 CT scans [9] and 1422 lung nodules (972 benign lung nodules and 450 malignant lung nodules). The nodules were rated from 1 to 5 by four experienced radiologists signifying the degree of malignancy in an increasing order. For benign-malignant classification task, nodules with average score less than 3 and greater than 3 were labeled as benign and malignant, respectively. Nodules with average score of 3 were left out in our experiments as all other works did [4,5,6]. Besides malignancy, eight semantic attributes (i.e., subtlety, calcification, sphericity, margin, spiculation, texture, lobulation and internal structure) were also scored in the LIDC-IDRI dataset. The higher the score is, the more obvious the characteristic is. Most features were rated in the range of 1–5, while the internal structure and calcification were given scores in the range of 1–4 and 1–6, respectively. We rescaled the average score labels from 1–5, 1–6, 1–4 to 0–1 for normalization before training.

We divided the dataset into training (90%) and testing (10%) sets following the setting in [4], which is well calculated so the sampled training and testing dataset has similar distribution. We cropped an adaptive patch region according to the diameter and position of the nodule and resized the patch to 224 \(\times \) 224 using bilinear interpolation. In addition, we employed random cropping, horizontal flipping, and vertical flipping as data augmentations. In [12], Dou et al. employed 3D CNN to preserve more spatial information. Instead, we use 2D CNN to explore each slice’s malignancy and semantic attribute score, and then averaged the probability scores of slices enclosing nodule to get the final results as mentioned in [6]. This method may lose some spatial information, but the average operation can effectively prevent overfitting.

Fig. 3.
figure 3

Left part: classification outputs from previous work’s model [4, 6]. Right part: classification outputs with attribute score from MTMR-Net. Sub, Is, Cal, Sph, Mar, Lob, Spi, Tex denotes subtlety, internal structure, calcification, sphericity, margin, spiculation, lobulation and texture, respectively. Score for each attribute is rescaled to the range of 0–1. The higher the score is, the more obvious the characteristic is.

3.2 Results and Evaluation Comparison

Benign-Malignant Classification. We compared the proposed model with several state-of-the-art methods and performed an ablation analysis of the proposed model. The results are reported in Table 1. We employed four commonly used metrics for the comparison: accuracy, specificity, sensitivity and area under curve (AUC); the definitions of these metrics can be found in [6]. As shown in Table 1, our method achieved the best accuracy, sensitivity and comparable specificity, AUC when compared with state-of-the-art methods, demonstrating the effectiveness of exploiting the relatedness of classification task and attribute prediction task as well as the margin ranking loss in improving the classification accuracy. In order to carefully scrutinize the contributions of different components of the proposed model, we further compared the proposed original the 50-layer Residual Net, the MTMR-Net without MSE Loss, and the MTMR-Net without MR Loss. It is observed that both the MTMR-Net without MSE Loss and the MTMR-Net without MR Loss achieve better performance than the 50-layer Residual Net while the proposed model not only further improved the performance but also outperformed the 50-layer Residual Net by a great margin, further corroborating the effectiveness of the proposed multi-task learning scheme as well as the margin ranking loss.

Table 1. Performance of lung nodule classification methods on LIDC-IDRI dataset

Nodule Attribute Score Regression. We further compared the results of attribute score prediction of our model with two commonly used models, lasso regression model and elastic network, as well as a state-of-the-art method, MTR [5]. The results are shown in Table 2. We employed the metric of absolute distance error to evaluate the prediction results and its definition can be found in [5]. Compared with previous methods, our model achieved significantly lower absolute distance error on most of the features, demonstrating in our multi-task model trained based on the relatedness between these two tasks, while the attribute prediction task can improve the performance of the classification task, in turn, the classification task can also enhance the attribute prediction accuracy.

Figure 3 showed typical results of classification and the corresponding attribute prediction results. Inspiringly, we found our results are quite consistent with those of previous clinical studies. For example, the malignant cases usually have higher calcification, higher lobulation and lower spiculation while internal structure has no influence on malignancy diagnosis. The results also demonstrate that we cannot classify the nodules based solely on one or two attributes. However, we should comprehensively consider more attributes, which has also been stated in many clinical studies. Compared with previous methods without explicitly exploring the relatedness of two tasks, the proposed model can also provide more cues and evidence for diagnosis by simultaneously outputting the attribute scores, besides better classification accuracy. The proposed method not only can be used in automated lung nodule diagnosis systems, but also it can be employed as a tool for the investigations which aim at revealing the underlying yet complicated relationship between the malignancy of a nodule and its attributes as shown in Fig. 3.

Table 2. Performance of attribute scores prediction. MTR, LASSO, EN are multi-task regression model [5], lasso regression model and elastic network, respectively. Sub, Is, Cal, Sph, Mar, Lob, Spi, Tex shares the same definition as in Fig. 3. The score is calculated on the original unscaled data.

4 Conclusion

In this paper, we presented the MTMR-Net under a multi-task deep learning framework with margin ranking loss for automated lung nodule analysis. The relatedness between lung nodule classification and attribute score regression was explicitly explored with multi-task deep learning, which contributed to the performance gains of both tasks. Furthermore, a novel margin ranking loss was explored to model nodule heterogeneity and encourage the discrimination capability of ambiguous nodule cases. Extensive experiments on the benchmark dataset verified the efficacy of our method and achieved competitive performance over the state-of-the-arts.