Multi-modal vertebrae recognition using Transformed Deep Convolution Network

https://doi.org/10.1016/j.compmedimag.2016.02.002Get rights and content

Highlights

  • First cross modality 2D vertebra recognition, efficient clinical tool.

  • New Transformed Deep Convolution Network, great potential to many cross modality organ recognition.

  • Above 90% sensitivity.

Abstract

Automatic vertebra recognition, including the identification of vertebra locations and naming in multiple image modalities, are highly demanded in spinal clinical diagnoses where large amount of imaging data from various of modalities are frequently and interchangeably used. However, the recognition is challenging due to the variations of MR/CT appearances or shape/pose of the vertebrae. In this paper, we propose a method for multi-modal vertebra recognition using a novel deep learning architecture called Transformed Deep Convolution Network (TDCN). This new architecture can unsupervisely fuse image features from different modalities and automatically rectify the pose of vertebra. The fusion of MR and CT image features improves the discriminativity of feature representation and enhances the invariance of the vertebra pattern, which allows us to automatically process images from different contrast, resolution, protocols, even with different sizes and orientations. The feature fusion and pose rectification are naturally incorporated in a multi-layer deep learning network. Experiment results show that our method outperforms existing detection methods and provides a fully automatic location + naming + pose recognition for routine clinical practice.

Introduction

Magnetic resonance imaging (MR) and computed tomography (CT) are two main imaging methods that are intensively and interchangeably used by spine physicians. The longitudinal/differential diagnoses today are often conducted in large MR/CT dataset which makes manual identification of vertebrae a tedious and time-consuming task. Automatic locate-and-name system of spine MR/CT images which supports quantitative measurement is thus highly demanded for orthopaedics, neurology, and oncology. Automatic vertebra recognition, particularly the identification of vertebra location, naming, and pose (orientation + scale), is a challenging problem in spine image analysis. The main difficulty arises from the high variability of image appearance due to image modalities or shape deformations of the vertebraes: (1) Vertebra is difficult to detect due to imaging modalities. The image resolution, contrast and appearance for the same spine structure could be very different when it is exposed to MR/CT, or T1/T2 weighted MR images. (2) Vertebra is difficult to automatically name. The vertebrae and intervertebral discs are lack of unique characteristic features that automatic naming could fail easily. (3) Vertebra pose is difficult to estimate. The poses of vertebrae are highly diverse and little stable features can be used for pose estimation. Except for the local pose and appearance problems, the global geometry of spine is often difficult to recover in some medical situations, i.e., spine deformity and scoliosis. The reconstruction of global spine geometry from limited CT/MR slices can be ill-posed and requires sophisticated learning algorithms.

Most current spine detection methods focus on identification of vertebra locations or labels in particular one particular image modality [1], [2], [3], [4], [5], and vertebra pose information 9s seldom obtained in the same method. (1) For vertebra localization, learning-based detectors were employed for handling specified image modalities, they were proven to work on CT (generalized Hough) [2], MR (Adaboost) [3], or DXA images (random forest) [6]. Their training and testing were performed on the chosen image protocol only. Some detection methods claimed they can work on both MR and CT. Štern et al. [7] utilized the curved spinal geometric structure extracted from both modality. Kelm et al. and Lootus et al. [8], [9] used boosting-trained Haar features and SVM-trained Histogram of Oriented Gradients (HOG) respectively. However, these cross-modality methods often required the separated training for MR and CT, and thus the separated testing for the two modalities too. (2) For vertebrae naming, [2], [3], [4], [5] had successful labeling on fully or partially scanned image volumes. Their methods relied on the identification of some special landmarks detected from multiple image views, i.e., axial view templates [2], spinal canals [5] or anchor vertebrae [3], while the exact labels are inferred by a probability inference model, i.e., a graph model [10], Hidden Markov Model (HMM) [4], or hierarchical model [3], [18]. (3) Besides the detection and naming, vertebral pose is critical information in orthopedics. Pose estimation was used [1], [8], [5] for extracting the 3D structure of the spines. These estimation methods exploited the multi-planar detector to match the correct vertebrae poses, but can not directly used in a single slice input. In addition, most of the training-based methods, as pointed out in [11], required dense manual annotations for ground truth labels, i.e., annotations of all the corners and the center for each vertebrae. This makes the training-based method not convenient to use.

To overcome these limitations, we uniquely propose a unified framework using Transformed Deep Convolution Network (TDCN) to provide automatic cross modality vertebrae location, naming, and pose estimation. As presented in Fig. 1, our system is a learning-based recognition system which contains a multi-step training stage and an efficient testing stage. The example results on MR and CT are shown Fig. 2. The main ingredients of the system is a novel deep learning model [12] inspired by groupwise registration [13], [14] and multi-modal feature fusion [15], [16]. We have the following contributions in this paper:

  • Vertebra recognition. The location, name, and pose (scale + orientation) of each vertebra are identified simultaneously. Further spine shape analysis, i.e., spondylolysis analysis, is then possible basing on the recognition results.

  • Multi-modal feature learning. The vertebra features are jointly learned and fused from both MR and CT. This enhances the features discrimination and improves the classification of vertebra/non-vertebra.

  • Invariant representation. In the training and recognition stage, the sampled of detected vertebrae are automatically aligned, generating transform-invariant feature representations or rectified final poses respectively.

  • Simple annotation. Thanks to the invariant representation, our method only requires single-clicking for each vertebrae in ground truth annotation while other methods [8], [5], [9] require four clicks or more.

Section snippets

The Transformed Deep Convolution Network

The Transformed Deep Convolution Network (TDCN) is a novel deep network structure, which can automatic extract the best representative and invariant features for MR/CT. It employs MR–CT feature fusion to enhance the feature discriminativity, and applies alignment transforms for input data to generate invariant representation. This resolves the modality and pose variation problems in vertebra recognition. The overall structure of TDCN presented in Fig. 3. The two major components in TDCN: the

The training of multi-modal recognition system

The training process of the TDCN system is presented in Fig. 6. The process starts with the annotation of sample patches in original scans. It then trains TCDN using the selected samples, generating invariant vertebra features. The features are applied in training a SVM, obtaining the desired vertebra classifier.

One-click sample annotation. The training samples (positive/negative) are collected by simple clicking operations in image slices. For positive samples, the user only need to click the

The vertebra recognition

We directly apply the trained multi-modal recognition system for full automatic vertebra recognition on arbitrary spine image. The overall recognition process is shown in Fig. 7.

Vertebra detection. As shown in the first step of Fig. 7, to simulate the various poses of the vertebrae, we first rotate or rescale the input MR/CT image, generating a set of articulately transformed images. Regular patches (i.e., 51 × 51) are then randomly sampled from the images, and are sent as input to the trained

Experiments

Our method is tested on a cross modality MR + CT dataset which contains 60 MR volumes (T1 and T2 included) and 90 CT volumes from subjects with different pathologies (i.e., fracture, spondylolisthesis). Particularly, 30 pairs of MR–CT lumbar volumes are from Spineweb,1 50 CT volumes (lumbar and cervical) are from MS Annotated Spine CT Database,2 and the rest volumes (lumbar and whole spine) are from our

Discussion

2D vs. 3D methods. There are a number of 3D vertebra detection methods [4], [11], [18], [19] that can provide vertebra identification in 3D space, particularly for CT images. However, as reported in [11], [13], 3D detection is still very challenging especially in some extreme pathological cases. Instead of competing the performance in 3D spine recognition where large detection error could occur, the proposed method in this paper focuses on the 2D spine recognition and provides a fast, stable

Conclusion

In this paper, we proposed a multi-modal vertebra recognition framework using Transformed Deep Convolution Network (TDCN). TDCN automatically extract modal adaptive, high discriminative, and pose invariant features for recognition. Using the TDCN-based recognition system, we can simultaneously identify the locations, labels, and poses of vertebra structures in both MR and CT. The system has successfully passed the tests on multi-modal datasets for lumbar and whole spine scans with high accuracy

References (19)

There are more references available in the full text version of this article.

Cited by (90)

  • Medical image identification methods: A review

    2024, Computers in Biology and Medicine
  • New open-source software for subcellular segmentation and analysis of spatiotemporal fluorescence signals using deep learning

    2022, iScience
    Citation Excerpt :

    In recent years there has been a surge in the use of automated-computer-aided detection for biomedical image processing and analysis (Cai et al., 2016; De Vos et al., 2016; Leigh et al., 2020; Roth et al., 2015; Teramoto et al., 2016).

  • Review of Deep Learning Applications in Spinal Image Segmentation

    2024, Jisuanji Gongcheng/Computer Engineering
  • SWT-UNet: Medical Image Segmentation Based on Multi-modality UNet with Sliding Window Transformer Block

    2023, Proceedings - 2023 2023 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2023
View all citing articles on Scopus
View full text