Multi-modal vertebrae recognition using Transformed Deep Convolution Network

doi:10.1016/j.compmedimag.2016.02.002

Computerized Medical Imaging and Graphics

Volume 51, July 2016, Pages 11-19

https://doi.org/10.1016/j.compmedimag.2016.02.002 Get rights and content

Highlights

•
First cross modality 2D vertebra recognition, efficient clinical tool.
•
New Transformed Deep Convolution Network, great potential to many cross modality organ recognition.
•
Above 90% sensitivity.

Abstract

Automatic vertebra recognition, including the identification of vertebra locations and naming in multiple image modalities, are highly demanded in spinal clinical diagnoses where large amount of imaging data from various of modalities are frequently and interchangeably used. However, the recognition is challenging due to the variations of MR/CT appearances or shape/pose of the vertebrae. In this paper, we propose a method for multi-modal vertebra recognition using a novel deep learning architecture called Transformed Deep Convolution Network (TDCN). This new architecture can unsupervisely fuse image features from different modalities and automatically rectify the pose of vertebra. The fusion of MR and CT image features improves the discriminativity of feature representation and enhances the invariance of the vertebra pattern, which allows us to automatically process images from different contrast, resolution, protocols, even with different sizes and orientations. The feature fusion and pose rectification are naturally incorporated in a multi-layer deep learning network. Experiment results show that our method outperforms existing detection methods and provides a fully automatic location + naming + pose recognition for routine clinical practice.

Introduction

Magnetic resonance imaging (MR) and computed tomography (CT) are two main imaging methods that are intensively and interchangeably used by spine physicians. The longitudinal/differential diagnoses today are often conducted in large MR/CT dataset which makes manual identification of vertebrae a tedious and time-consuming task. Automatic locate-and-name system of spine MR/CT images which supports quantitative measurement is thus highly demanded for orthopaedics, neurology, and oncology. Automatic vertebra recognition, particularly the identification of vertebra location, naming, and pose (orientation + scale), is a challenging problem in spine image analysis. The main difficulty arises from the high variability of image appearance due to image modalities or shape deformations of the vertebraes: (1) Vertebra is difficult to detect due to imaging modalities. The image resolution, contrast and appearance for the same spine structure could be very different when it is exposed to MR/CT, or T1/T2 weighted MR images. (2) Vertebra is difficult to automatically name. The vertebrae and intervertebral discs are lack of unique characteristic features that automatic naming could fail easily. (3) Vertebra pose is difficult to estimate. The poses of vertebrae are highly diverse and little stable features can be used for pose estimation. Except for the local pose and appearance problems, the global geometry of spine is often difficult to recover in some medical situations, i.e., spine deformity and scoliosis. The reconstruction of global spine geometry from limited CT/MR slices can be ill-posed and requires sophisticated learning algorithms.

Most current spine detection methods focus on identification of vertebra locations or labels in particular one particular image modality [1], [2], [3], [4], [5], and vertebra pose information 9s seldom obtained in the same method. (1) For vertebra localization, learning-based detectors were employed for handling specified image modalities, they were proven to work on CT (generalized Hough) [2], MR (Adaboost) [3], or DXA images (random forest) [6]. Their training and testing were performed on the chosen image protocol only. Some detection methods claimed they can work on both MR and CT. Štern et al. [7] utilized the curved spinal geometric structure extracted from both modality. Kelm et al. and Lootus et al. [8], [9] used boosting-trained Haar features and SVM-trained Histogram of Oriented Gradients (HOG) respectively. However, these cross-modality methods often required the separated training for MR and CT, and thus the separated testing for the two modalities too. (2) For vertebrae naming, [2], [3], [4], [5] had successful labeling on fully or partially scanned image volumes. Their methods relied on the identification of some special landmarks detected from multiple image views, i.e., axial view templates [2], spinal canals [5] or anchor vertebrae [3], while the exact labels are inferred by a probability inference model, i.e., a graph model [10], Hidden Markov Model (HMM) [4], or hierarchical model [3], [18]. (3) Besides the detection and naming, vertebral pose is critical information in orthopedics. Pose estimation was used [1], [8], [5] for extracting the 3D structure of the spines. These estimation methods exploited the multi-planar detector to match the correct vertebrae poses, but can not directly used in a single slice input. In addition, most of the training-based methods, as pointed out in [11], required dense manual annotations for ground truth labels, i.e., annotations of all the corners and the center for each vertebrae. This makes the training-based method not convenient to use.

To overcome these limitations, we uniquely propose a unified framework using Transformed Deep Convolution Network (TDCN) to provide automatic cross modality vertebrae location, naming, and pose estimation. As presented in Fig. 1, our system is a learning-based recognition system which contains a multi-step training stage and an efficient testing stage. The example results on MR and CT are shown Fig. 2. The main ingredients of the system is a novel deep learning model [12] inspired by groupwise registration [13], [14] and multi-modal feature fusion [15], [16]. We have the following contributions in this paper:

•
Vertebra recognition. The location, name, and pose (scale + orientation) of each vertebra are identified simultaneously. Further spine shape analysis, i.e., spondylolysis analysis, is then possible basing on the recognition results.
•
Multi-modal feature learning. The vertebra features are jointly learned and fused from both MR and CT. This enhances the features discrimination and improves the classification of vertebra/non-vertebra.
•
Invariant representation. In the training and recognition stage, the sampled of detected vertebrae are automatically aligned, generating transform-invariant feature representations or rectified final poses respectively.
•
Simple annotation. Thanks to the invariant representation, our method only requires single-clicking for each vertebrae in ground truth annotation while other methods [8], [5], [9] require four clicks or more.

Section snippets

The Transformed Deep Convolution Network

The Transformed Deep Convolution Network (TDCN) is a novel deep network structure, which can automatic extract the best representative and invariant features for MR/CT. It employs MR–CT feature fusion to enhance the feature discriminativity, and applies alignment transforms for input data to generate invariant representation. This resolves the modality and pose variation problems in vertebra recognition. The overall structure of TDCN presented in Fig. 3. The two major components in TDCN: the

The training of multi-modal recognition system

The training process of the TDCN system is presented in Fig. 6. The process starts with the annotation of sample patches in original scans. It then trains TCDN using the selected samples, generating invariant vertebra features. The features are applied in training a SVM, obtaining the desired vertebra classifier.

One-click sample annotation. The training samples (positive/negative) are collected by simple clicking operations in image slices. For positive samples, the user only need to click the

The vertebra recognition

We directly apply the trained multi-modal recognition system for full automatic vertebra recognition on arbitrary spine image. The overall recognition process is shown in Fig. 7.

Vertebra detection. As shown in the first step of Fig. 7, to simulate the various poses of the vertebrae, we first rotate or rescale the input MR/CT image, generating a set of articulately transformed images. Regular patches (i.e., 51 × 51) are then randomly sampled from the images, and are sent as input to the trained

Experiments

Our method is tested on a cross modality MR + CT dataset which contains 60 MR volumes (T1 and T2 included) and 90 CT volumes from subjects with different pathologies (i.e., fracture, spondylolisthesis). Particularly, 30 pairs of MR–CT lumbar volumes are from Spineweb,¹ 50 CT volumes (lumbar and cervical) are from MS Annotated Spine CT Database,² and the rest volumes (lumbar and whole spine) are from our

Discussion

2D vs. 3D methods. There are a number of 3D vertebra detection methods [4], [11], [18], [19] that can provide vertebra identification in 3D space, particularly for CT images. However, as reported in [11], [13], 3D detection is still very challenging especially in some extreme pathological cases. Instead of competing the performance in 3D spine recognition where large detection error could occur, the proposed method in this paper focuses on the 2D spine recognition and provides a fast, stable

Conclusion

In this paper, we proposed a multi-modal vertebra recognition framework using Transformed Deep Convolution Network (TDCN). TDCN automatically extract modal adaptive, high discriminative, and pose invariant features for recognition. Using the TDCN-based recognition system, we can simultaneously identify the locations, labels, and poses of vertebra structures in both MR and CT. The system has successfully passed the tests on multi-modal datasets for lumbar and whole spine scans with high accuracy

References (19)

T. Klinder et al.
Automated model-based vertebra detection, identification, and segmentation in CT images
Med Image Anal
(2009)
D. Major et al.
Automated landmarking and labeling of fully and partially scanned spinal columns in CT images
Med Image Anal
(2013)
J. Ma et al.
Hierarchical segmentation and identification of thoracic vertebra using learning-based edge detection and coarse-to-fine deformable model
Comput Vis Image Underst
(2013)
V. Pekar et al.
Automated planning of scan geometries in spine MRI scans
Y. Zhan et al.
Robust MR spine detection using hierarchical learning and local articulated model
B. Glocker et al.
Automatic localization and identification of vertebrae in arbitrary field-of-view CT scans
M.G. Roberts et al.
Automatic location of vertebrae on DXA images using random forest regression
D. Štern et al.
Automated detection of spinal centrelines, vertebral bodies and intervertebral discs in CT and MR images of lumbar spine
Phys Med Biol
(2010)
B.M. Kelm et al.
Spine detection in CT and MR using iterated marginal space learning
Med Image Anal
(2013)

There are more references available in the full text version of this article.

Cited by (90)

Medical image identification methods: A review
2024, Computers in Biology and Medicine
The identification of medical images is an essential task in computer-aided diagnosis, medical image retrieval and mining. Medical image data mainly include electronic health record data and gene information data, etc. Although intelligent imaging provided a good scheme for medical image analysis over traditional methods that rely on the handcrafted features, it remains challenging due to the diversity of imaging modalities and clinical pathologies. Many medical image identification methods provide a good scheme for medical image analysis. The concepts pertinent of methods, such as the machine learning, deep learning, convolutional neural networks, transfer learning, and other image processing technologies for medical image are analyzed and summarized in this paper. We reviewed these recent studies to provide a comprehensive overview of applying these methods in various medical image analysis tasks, such as object detection, image classification, image registration, segmentation, and other tasks. Especially, we emphasized the latest progress and contributions of different methods in medical image analysis, which are summarized base on different application scenarios, including classification, segmentation, detection, and image registration. In addition, the applications of different methods are summarized in different application area, such as pulmonary, brain, digital pathology, brain, skin, lung, renal, breast, neuromyelitis, vertebrae, and musculoskeletal, etc. Critical discussion of open challenges and directions for future research are finally summarized. Especially, excellent algorithms in computer vision, natural language processing, and unmanned driving will be applied to medical image recognition in the future.
New open-source software for subcellular segmentation and analysis of spatiotemporal fluorescence signals using deep learning
2022, iScience
Citation Excerpt :
In recent years there has been a surge in the use of automated-computer-aided detection for biomedical image processing and analysis (Cai et al., 2016; De Vos et al., 2016; Leigh et al., 2020; Roth et al., 2015; Teramoto et al., 2016).
Cellular imaging instrumentation advancements as well as readily available optogenetic and fluorescence sensors have yielded a profound need for fast, accurate, and standardized analysis. Deep-learning architectures have revolutionized the field of biomedical image analysis and have achieved state-of-the-art accuracy. Despite these advancements, deep learning architectures for the segmentation of subcellular fluorescence signals is lacking. Cellular dynamic fluorescence signals can be plotted and visualized using spatiotemporal maps (STMaps), and currently their segmentation and quantification are hindered by slow workflow speed and lack of accuracy, especially for large datasets. In this study, we provide a software tool that utilizes a deep-learning methodology to fundamentally overcome signal segmentation challenges. The software framework demonstrates highly optimized and accurate calcium signal segmentation and provides a fast analysis pipeline that can accommodate different patterns of signals across multiple cell types. The software allows seamless data accessibility, quantification, and graphical visualization and enables large dataset analysis throughput.
Machine learning for image analysis in the cervical spine: Systematic review of the available models and methods
2022, Brain and Spine
Review of Deep Learning Applications in Spinal Image Segmentation
2024, Jisuanji Gongcheng/Computer Engineering
A Small Intestinal Stromal Tumor Detection Method Based on an Attention Balance Feature Pyramid
2023, Sensors
SWT-UNet: Medical Image Segmentation Based on Multi-modality UNet with Sliding Window Transformer Block
2023, Proceedings - 2023 2023 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2023

View all citing articles on Scopus

View full text

Multi-modal vertebrae recognition using Transformed Deep Convolution Network

Highlights

Abstract

Introduction

Section snippets

The Transformed Deep Convolution Network

The training of multi-modal recognition system

The vertebra recognition

Experiments

Discussion

Conclusion

Med Image Anal

Med Image Anal

Comput Vis Image Underst

Automated planning of scan geometries in spine MRI scans

Robust MR spine detection using hierarchical learning and local articulated model

Automatic localization and identification of vertebrae in arbitrary field-of-view CT scans

Automatic location of vertebrae on DXA images using random forest regression

Automated detection of spinal centrelines, vertebral bodies and intervertebral discs in CT and MR images of lumbar spine

Phys Med Biol

Spine detection in CT and MR using iterated marginal space learning

Med Image Anal