Elsevier

Medical Image Analysis

Volume 55, July 2019, Pages 103-115
Medical Image Analysis

Direct automated quantitative measurement of spine by cascade amplifier regression network with manifold regularization

https://doi.org/10.1016/j.media.2019.04.012Get rights and content

Highlights

  • A novel regression network named CARN is proposed to achieve automated quantitative measurement of the spine, which provides a reliable measurement for the clinical diagnosis and assessment of spinal diseases.

  • The local structure-preserved manifold regularization (LSPMR) is proposed to generate discriminative feature embedding, which largely improves the performance of multiple indices estimation.

  • The adaptive local shape-constrained manifold regularization (ALSCMR) is proposed to alleviate overfitting. This provides a novel approach for multi-output regression to improve the generalization of the multi-output regression network.

Abstract

Automated quantitative measurement of the spine (i.e., multiple indices estimation of heights, widths, areas, and so on for the vertebral body and disc) plays a significant role in clinical spinal disease diagnoses and assessments, such as osteoporosis, intervertebral disc degeneration, and lumbar disc herniation, yet still an unprecedented challenge due to the variety of spine structure and the high dimensionality of indices to be estimated. In this paper, we propose a novel cascade amplifier regression network (CARN) with manifold regularization including local structure-preserved manifold regularization (LSPMR) and adaptive local shape-constrained manifold regularization (ALSCMR), to achieve accurate direct automated multiple indices estimation. The CARN architecture is composed of a cascade amplifier network (CAN) for expressive feature embedding and a linear regression model for multiple indices estimation. The CAN produces an expressive feature embedding by cascade amplifier units (AUs), which are used for selective feature reuse by stimulating effective feature and suppressing redundant feature during propagating feature map between adjacent layers. During training, the LSPMR is employed to obtain discriminative feature embedding by preserving the local geometric structure of the latent feature space similar to the target output manifold. The ALSCMR is utilized to alleviate overfitting and generate realistic estimation by learning the multiple indices distribution. Experiments on T1-weighted MR images of 215 subjects and T2-weighted MR images of 20 subjects show that the proposed approach achieves impressive performance with mean absolute errors of 1.22 ± 1.04 mm and 1.24 ± 1.07 mm for the 30 lumbar spinal indices estimation of the T1-weighted and T2-weighted spinal MR images respectively. The proposed method has great potential in clinical spinal disease diagnoses and assessments.

Introduction

The quantitative measurement of the spine (i.e., multiple indices estimation of heights, widths, areas, and so on for the vertebral body and disc) is a practical means of clinical spinal disease diagnoses and assessments, such as osteoporosis, intervertebral disc degeneration, and lumbar disc herniation. Among these indices to be estimated, the vertebral body height (VBH) and intervertebral disc height (IDH) are the most valuable for these spinal diseases diagnoses and assessments. As shown in Fig. 1, the 30 estimated indices for the lumbar spine include 15 VBHs and 15 IDHs. Each vertebral body (intervertebral disc) contains 3 VBHs (IDHs) including anterior, middle, and posterior VBHs (IDHs). In clinical practice, the VBHs can be used to assess the vertebral fracture risk for the osteoporotic patients (McCloskey, Johansson, Oden, Kanis, 2012, Tatoń, Rokita, Korkosz, Wróbel, 2014) based on the fact that the VBHs are correlated with the bone strength. Furthermore, the IDH decreases with the intervertebral disc degeneration (Jarman, Arpinar, Baruah, Klein, Maiman, Muftuler, 2014, Salamat, Hutchings, Kwong, Magnussen, Hancock, 2016) and lumbar disc herniation (Tunset et al., 2013).

Automated quantitative measurement of the spine is of significant clinical importance due to several advantages including, time-saving, reproducibility, and higher consistency compared with manual quantitative measurement but remains as an exceedingly intractable task due to the following challenges:

  • It is difficult to obtain expressive feature embedding for such complex regression problem due to the high dimensionality of estimated indices (as shown in Fig. 1(a)).

  • Discriminative feature embedding is intractable to be generated due to the excessive ambiguity of the boundary between vertebral body (VB) and intervertebral disc for abnormal spine (as shown in Fig. 1(d)).

  • The implicit correlations between different estimated indices are difficult to be captured (as shown in Fig. 1(d), the heights of the abnormal disc and the heights of adjacent vertebral body are correlated because disc abnormality leads to simultaneous changes of IDH and the adjacent VBH).

  • The complex relationship between the spinal images and the estimated indices arises from the variability of images. Images with the same estimated indices often exhibit great variability due to inter-subject variations.

  • Insufficient labeled data, which possibly results in overfitting.

Existing relevant works for multiple indices estimation of the spine fall into three categories: (1) Manual measurements; (2) automated segmentation; (3) direct estimation.

Manual measurements aim to quantify the spine by manually measuring the disc height in vitro (Brinckmann and Grootenboer, 1991), detecting the landmark of the spine (Tunset, Kjaer, Chreiteh, Jensen, 2013, Videman, Battié, Gibbons, Gill, 2014) from MRI, and segmenting the disc and vertebral body from MRI (Videman et al., 2014). These manual methods are limited in clinical practice because they are time-consuming, tedious, nonreproducible, and susceptible to high inter-observer variability.

Automated segmentation-based methods focus on segmenting the intervertebral disc or vertebral body by active shape models (Castro et al., 2012), multi-atlas based models (Wang and Forsberg, 2016), superpixels based models (Barbieri et al., 2015), and deep learning based models (Korez et al., 2017). Although these methods achieve accurate segmentation of the intervertebral disc and vertebral body, the obtained segmentation is incapable of directly computing the required estimated indices.

In recent years, an increasing number of approaches emerged in the direct quantitative measurement of anatomical structures without the need for segmentation. These methods have achieved great performance in quantitative estimation such as cardiac volume (Xue, Lum, Mercado, Landis, Warrington, Li, 2017, Zhen, Wang, Islam, Bhaduri, Chan, Li, 2016, Zhen, Wang, Islam, Bhaduri, Chan, Li, 2014) and spinal curvature (Wu, Bailey, Rasoulinejad, Li, 2017, Sun, Zhen, Bailey, Rasoulinejad, Yin, Li, 2017). Zhen et al. (2014) used Multi-features and regression forests (Multi-features+RF) to jointly estimate the cardiac bi-ventricular volumes. Zhen et al. (2016) adopted Multi-scale convolutional deep belief network to learn unsupervised cardiac image representation and regression forests (MCDBN+RF) to generate bi-ventricular volumes estimation. Xue et al. (2017) utilized a convolutional neural network (CNN) and recurrent neural network in conjunction with both temporal and spatial information for full quantification of left ventricle. Sun et al. (2017) exploited histogram of oriented gradient descriptor (Dalal and Triggs, 2005) and structured support vector regression (HOG+SSVR) to improve the performance of spinal curvature assessment by exploiting the intrinsic inter-output correlation under the l2, 1-norm regularization and preserving the local geometrical structure invariance via manifold regularization.

Although these methods achieved promising performance in the quantification of the cardiac volume and spinal curvature, they are incapable of achieving quantitative measurement of the spine since they suffer from the following limitations. 1) Lack of expressive and discriminative feature representation. The hand-crafted features are not capable of capturing task-aware spinal structures robustly. Traditional CNN (Simonyan and Zisserman, 2014) is incapable of generating an expressive and discriminative feature for multiple indices estimation because CNN possibly loses effective feature due to the lack of an explicit structure for feature reuse. 2) Incapability of learning the estimated indices distribution, which will lead to unreasonable estimation and overfitting.

In this study, a cascade amplifier regression network (CARN) with manifold regularization is proposed for quantitative measurement of the spine from MR images. The CARN architecture is comprised of a cascade amplifier network (CAN) for expressive feature embedding and a linear regression model for multiple indices estimation; the manifold regularization including local structure-preserved manifold regularization (LSPMR) and adaptive local shape-constrained manifold regularization (ALSCMR) is proposed to construct the loss function. In the CAN, amplifier unit (AU) aims to reuse the selected feature between adjacent layers. As shown in Fig. 2 (b), the AU generates the selected feature by stimulating the effective feature of the anterior layer but suppressing the redundant feature. The selected feature is reused in the posterior layer by a concatenation operator. CAN reuses multi-level features selectively for representing complex spine, thus an expressive feature embedding is obtained. Using the CAN, the MR images are embedded into a latent feature space. The high dimensional indices lie in a target output manifold due to the correlations between these indices. To take advantage of the relationship between the latent feature space and target output manifold, the LSPMR is proposed to generate a discriminative feature embedding which preserves the local geometrical structure of the target output manifold. Additionally, the ALSCMR is designed to restrict the output of the CARN to the target output manifold. As a result, the distribution of the estimated indices is close to the real distribution, which reduces the impact of outliers and alleviates overfitting. Combining the expressive and discriminative feature embedding produced by CAN and LSPMR with ALSCMR, a simple linear regression model, i.e., a fully connected network, is sufficient to produce accurate estimation results.

The main contributions are as follows:

  • A novel regression network named CARN is proposed to achieve automated quantitative measurement of the spine, which provides a reliable measurement for the clinical diagnosis and assessment of spinal diseases.

  • The local structure-preserved manifold regularization (LSPMR) is proposed to generate discriminative feature embedding, which reduces the variability and largely improves the performance of multiple indices estimation.

  • The adaptive local shape-constrained manifold regularization (ALSCMR) is proposed to alleviate overfitting. This provides a novel approach for multi-output regression to improve the generalization of the multi-output regression network.

In this work, we advance our preliminary attempt (Pang et al., 2018) on quantitative measurement of the spine in the following aspects:

  • The LSPMR is proposed to obtain discriminative feature embedding, which largely reduces the variability in multi-output regression, and therefore achieves accurate multiple indices estimation.

  • The robustness of the proposed CARN is validated by extended experiments using a larger dataset which contains 215 T1-weighted images and 20 T2-weighted images.

  • The effectiveness of the proposed CARN is validated by comparing the performance with relevant machine learning based approaches.

  • The loss weight of the local shape-constrained manifold regularization for each sample is determined adaptively. The sample with more reconstruction error of local linear representation in target output manifold has more probability to be an outlier and therefore has more loss weight of local shape-constrained manifold regularization to alleviate overfitting. As a result, the estimated indices are close to their real distribution.

Section snippets

Cascade amplifier regression network architecture

The proposed CARN architecture achieves automated multiple indices estimation of the spine through an expressive feature embedding obtained by the CAN and a linear regression model. As shown in Fig. 2, CAN is a network which provides an expressive feature embedding by selective feature reuse using a series of AUs. The AU in CAN achieves selective feature reuse between the adjacent layers by a gate, multiplier, adder and concatenate operator. The selected feature map is generated by stimulating

Loss function with manifold regularization

The loss function improves the spinal indices estimation accuracy by combining a preliminary loss lossp with LSPMR loss lossl in conjunction with ALSCMR loss lossa. The preliminary loss is designed to minimize the distance between the estimation of indices and the ground truth. As shown in Fig. 3, the LSPMR is employed to achieve discriminative feature embedding by preserving the local geometrical structure of the latent feature space as same as the target output manifold. The ALSCMR is aimed

Datasets

There are two datasets including: 1) The T1-weighted dataset which includes 215 subjects is collected from multi-center and different manufacturers using the parameters as follows: repetition time (TR) =600 msec; echo time (TE) =14 msec; flip angle (FA) =90. There are four clinical groups in the subjects, including 101 patients with lumbar disc herniation (LDH), 18 patients with intervertebral disc degeneration (IDD), 29 patients with lumbar spondylolisthesis (LS), and 67 normal subjects. The

Results and analysis

The performance of the proposed method is evaluated on T1 dataset and T2 dataset separately due to the variation between the T1-weighted and T2-weighted MR images.

Conclusion

We have presented an accurate and robust method for automated quantitative measurement of the spine using CARN with manifold regularization. The CAN achieves expressive feature embedding by reusing the selected feature. The feature selection is implemented by stimulating the effective feature but suppressing the redundant feature during propagating feature map between adjacent layers. Whether the feature is effective or redundant is automatically learned during training. The LSPMR enhances the

Acknowledgments

Computations were performed using the data analytics Cloud at SHARCNET (http://www.sharcnet.ca) provided through the Southern Ontario Smart Computing Innovation Platform (SOSCIP); the SOSCIP consortium is funded bythe Ontario Government and the Federal Economic Development Agency for Southern Ontario. Financial support for this work was partly provided by the China Scholarship Council (no. 201708440350), the National Natural Science Foundation of China (no. U1501256), and the Science and

References (40)

  • K. Cho et al.

    Learning phrase representations using RNN encoder–decoder for statistical machine translation

    Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

    (2014)
  • N. Dalal et al.

    Histograms of oriented gradients for human detection

    2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05)

    (2005)
  • K. Hara et al.

    Growing Regression Forests by Classification: Applications to Object Pose Estimation

    Computer Vision – ECCV 2014

    (2014)
  • K. He et al.

    Deep residual learning for image recognition

    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2016)
  • S. Hochreiter et al.

    Long short-term memory

    Neural Comput.

    (1997)
  • G. Huang et al.

    Densely connected convolutional networks

    IEEE Conference on Computer Vision and Pattern Recognition

    (2017)
  • M. Huang et al.

    Brain tumor segmentation based on local independent projection-based classification

    IEEE Trans. Biomed. Eng.

    (2014)
  • J.P. Jarman et al.

    Intervertebral disc height loss demonstrates the threshold of major pathological changes during degeneration

    Eur. Spine J.

    (2014)
  • R. Korez et al.

    Intervertebral disc segmentation in mr images with 3d convolutional networks

    Medical Imaging 2017: Image Processing

    (2017)
  • W. Liu et al.

    Large graph construction for scalable semi-supervised learning

    Proceedings of the 27th international conference on machine learning (ICML-10)

    (2010)
  • Cited by (36)

    • Task relevance driven adversarial learning for simultaneous detection, size grading, and quantification of hepatocellular carcinoma via integrating multi-modality MRI

      2022, Medical Image Analysis
      Citation Excerpt :

      Specifically, for small size HCC, the accuracy of size grading decreased from 80.68% to 68.67% when removing MaTrans and decreased from 80.68% to 72.67% when removing Trd-Rg-D. For tiny size HCC, the accuracy of size grading decreased from 77.78% to 64.44% when removing MaTrans and decreased from 77.78% to 68.89% when removing Trd-Rg-D. The performance of multi-index quantification has been validated by comparing with three SOTA quantification networks (i.e. VGG-16 Indices-Net (Xue et al., 2017), CARN (Pang et al., 2019), and DE-Net (Lin et al., 2020)). The same network setting as the comparison experiment of detection task and size grading task is performed to these three SOTA quantification networks.

    View all citing articles on Scopus
    View full text