Real Time RNN Based 3D Ultrasound Scan Adequacy for Developmental Dysplasia of the Hip

Paserin, Olivia; Mulpuri, Kishore; Cooper, Anthony; Hodgson, Antony J.; Garbi, Rafeef

doi:10.1007/978-3-030-00928-1_42

Olivia Paserin²⁵,
Kishore Mulpuri²⁵,
Anthony Cooper²⁵,
Antony J. Hodgson²⁵ &
…
Rafeef Garbi²⁵

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11070))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

15k Accesses

Abstract

Acquiring adequate ultrasound (US) image data is crucial for accurate diagnosis of developmental dysplasia of the hip (DDH), the most common pediatric hip disorder affecting on average one in every one thousand births. Presently, the acquisition of high quality US deemed adequate for diagnostic measurements requires thorough knowledge of infant hip anatomy as well as extensive experience in interpreting such scans. This work aims to provide rapid assurance to the operator, automatically at the time of acquisition, that the data acquired are suitable for accurate diagnosis. To this end, we propose a deep learning model for a fully automatic scan adequacy assessment of 3D US volumes. Our contributions include developing an effective criteria that defines the features required for DDH diagnosis in an adequate 3D US volume, proposing an efficient neural network architecture composed of convolutional layers and recurrent layers for robust classification, and validating our model’s agreement with classification labels from an expert radiologist on real pediatric clinical data. To the best of our knowledge, our work is the first to make use of inter-slice information within a 3D US volume for DDH scan adequacy. Using 200 3D US volumes from 25 pediatric patients, we demonstrate an accuracy of 82% with an area under receiver operating characteristic curve of 0.83 and a clinically suitable runtime of one second.

You have full access to this open access chapter, Download conference paper PDF

Automatic Near Real-Time Evaluation of 3D Ultrasound Scan Adequacy for Developmental Dysplasia of the Hip

Artificial Intelligence to Automatically Assess Scan Quality in Hip Ultrasound

Article 17 July 2021

Uncertainty Estimation for Assessment of 3D US Scan Adequacy and DDH Metric Reliability

Keywords

1 Introduction

Developmental dysplasia of the hip (DDH) is a congenital condition representing a range of disorders involving a partial or complete dislocation of the hip joint. DDH is the most common pediatric hip disorder, affecting on average one in every one thousand births [1]. Failure to diagnose DDH in its early stages often gives rise to serious adverse outcomes affecting the hip such as painful early adult osteoarthritis and significant difficulties in future treatment that typically includes expensive corrective surgical procedures [2].

Ultrasound (US) imaging is currently considered the gold standard for DDH diagnosis during early childhood development as it is low cost, portable, and does not use potentially harmful ionizing radiation [3]. Although 2D US is the present clinical standard, several works have recently shown that using 3D US gives a more comprehensive measure of the anatomical deformity and is less prone to probe orientation related errors [6,7,8]. Our group has pioneered the use of 3D US for DDH diagnosis and shown that it markedly improves the reliability of dysplasia metric measurements compared to 2D US [8]. However, current analysis processes are computationally expensive, with runtimes of three minutes, limiting clinical relevance. Furthermore, the introduction of 3D US poses increased difficulty on operators who may not have experience with volumetric scans. The acquisition of high quality US volumes that are adequate for diagnostic measurements remains an especially challenging task as it requires thorough knowledge of infant hip anatomy and extensive experience in interpreting scans. Such challenges exist even when 2D US is used, e.g. when the quality of hip sonograms across 8 German states was studies in 2011, up to 43% of tested hip sonographers had their licenses revoked because they could not demonstrate sufficient adherence to the imaging guidelines [4]. The top reasons for misdiagnoses were: (1) US probe orientation errors; (2) incorrect anatomical interpretation; and (3) lack of adequacy checks [5]. To improve clinical usability of 3D US, our work aims to provide rapid assurance at the time of acquisition that the US data acquired is suitable for diagnosis.

US standard plane detection, an issue similar to that of US scan adequacy, has been addressed in other fields such as fetal abnormality screening [9,10,11] and cardiac imaging [12, 13] in an effort to provide feedback to sonographers. Maraci et al. [9], Chen et al. [11], Baumgartner et al. [12], and Abdi et al. [13] each proposed classifiers for categorisation of 2D slices from US video data, and Rahmatullah et al. [10] proposed a method based on the AdaBoost learning algorithm for US volume data. In an earlier work [14], our group proposed a technique for automatic 2D US scan adequacy detection in DDH but applying that approach sequentially to slices of a 3D US volume would require a long processing time hampering clinical use. We subsequently developed a fast approach for automatic 3D US scan adequacy [15] but the classified adequacy remained slice-by-slice based thus did not make use of rich, and often very informative, inter-slice information when considering the spatial relationship of the responses from sequential frames within a volume.

In this paper, we propose a deep learning model for fully automatic scan adequacy assessment of 3D US volumes. We design a recurrent neural network (RNN) architecture to incorporate inter-slice information within a 3D US volume for DDH screening. More specifically, our contributions include: (1) developing a list of criteria that defines the features required in an adequate 3D US volume for DDH diagnosis, (2) proposing a neural network architecture, trained end-to-end, comprising convolutional layers and recurrent layers that robustly classify US scan adequacy, and (3) validating our model’s agreement with classification labels from an expert radiologist on real pediatric clinical data.

2 Materials and Method

2.1 Dataset

As part of a larger collaboration with pediatric orthopedic surgeons at British Columbia Children’s Hospital, including a multi-year DDH clinical study conducted by our research team, we acquired 200 3D B-mode US volumes from 25 pediatric patients (acquired by two pediatric orthopedic surgeons). The data were obtained as part of routine clinical care under appropriate institutional review board approval using a SonixTouch Q+ scanner (Analogic Inc., Peabody, MA, USA) with a 4DL14-5/38 linear 4D transducer set at 7 MHz and positioned in the coronal plane. Each acquired volume comprised 200 slices with an axial resolution of 0.17 mm. In order to harmonise the input image dimensions to our neural network, we resized the images to $256\,\times \,256$ pixels corresponding to a x-dimension of 38 mm and variable y-dimension of a minimum of 38 mm.

2.2 3D US Scan Adequacy Criteria

It is important to note that a gold standard for clinical classification of US volumes does not yet exist, since 2D assessment is currently the clinical standard for DDH screening. Together with an expert radiologist, we thus developed a list of criteria that defines the features required in an adequate 3D US volume for proper subsequent extraction of the commonly used DDH metrics, namely the $\alpha $ angle (angle between the plane of the ilium and the acetabulum), $\beta $ angle (angle between the plane of the ilium and the labrum), and femoral head coverage (the percentage area of the femoral head medial to the ilium) [8]. Therefore, anatomical features required to be present within the scan include the ilium, acetabulum, labrum, ischium and entire femoral head as illustrated in Fig. 1. When a volume properly captures the entire hip joint, the femoral head, a hypo-echoic spherical structure, should be seen growing and shrinking in size across the encompassing slices. Additionally, the ilium must appear as a straight, horizontal hyper-echoic line and the acetabulum must appear continuous with the iliac bone. Notably, although these features should be collectively present within an adequate volume, they do not necessarily all need to be present within any single slice of the volume, hence a slice-by-slice analysis is not ideal and compromises accuracy.

2.3 Proposed CNN-RNN Network Architecture

In order to leverage spatial inter-slice information within a volume, we propose a neural network architecture composed of convolutional layers to extract hierarchical features from a scan, followed by recurrent layers to capture the spatial relationship of their responses. An overview of the network is shown in Fig. 2. We designed and implemented our model in Keras [16], a Python API with a TensorFlow [17] backend.

Extracting Hierarchical Features. Due to the relatively small sample size, we deployed a simple Convolutional Neural Network (CNN) architecture to avoid overfitting to the training dataset despite regularisation. Specifically, we used a CNN inspired by the VGG architecture [19] as it has proven to generalise well to other datasets. We include five convolutional layers, increasing the feature maps by a factor of two at each layer. In order to limit the number of parameters in our model, each convolutional layer has small $3\,\times \,3$-sized kernels with their number of kernels increasing by a factor of two as well. Using Keras’ TimeDistributed wrapper to process sequential frames as a sequence, we apply convolutions to all the frames of an US volume (sequence of frames). To reduce the feature maps to half their size as well as to decrease feature variance for improved generalisability in our model, we employ Rectified Linear Units (ReLUs) as nonlinear activation functions between layers and $2\,\times \,2$ max-pooling operations with a stride of 2. Lastly, to prevent co-adaptation of features and overfitting to the training dataset, we include a dropout layer with a dropout probability of 0.5.

Leveraging Spatial Inter-slice Information. Long Short-Term Memory (LSTM) [18] networks are Recurrent Neural Networks (RNNs) with an architecture designed for sequence processing. Since we have a relatively small dataset, we propose the use of an RNN over 3D convolutions since they require less parameters for training and are therefore better suited. LSTMs comprise gates that solve the problem of vanishing and exploding gradients, allowing them to store information over long time intervals, well suited for sequences. To analyse inter-slice information, we apply this sequential-learning strategy by inputting a sequence of features extracted from the time-distributed CNN into our LSTM layer. The LSTM uses a system of memory gated functions to process each frame of a sequence while learning to store only the important features from each frame. Our LSTM layer has 256 units and is followed by a dropout probability of 0.5 for improved generalisability.

2.4 Training

In our experiments, we split the available data by patients rather than by volumes in order to avoid mixing similar volume samples between training, validation, and testing data. We split our 25 patients into 60% training, 20% validation, and 20% testing. This resulted in 135 volumes from 15 patients for training and 45 volumes from 5 patients for validation. Additionally, we saved 20 raw US volumes from 5 patients for testing our final model and for cross checking the results with those of our expert radiologist. Each data subset had approximately equal number of adequate and inadequate volumes.

To prepare adequate and inadequate labels for sequences from our volumes, let $S = \{F_1, F_2,..., F_n\}$ denote a sequence of n frames in which $F_A$ and $F_B$ are the first and last frames with any diagnostic features present, respectively. All frames $F_A,..., F_B$ are thus grouped as a sequence and labeled adequate. The remaining frames $F_1,..., F_{A-1}$ and $F_{B+1},..., F_n$ are labeled inadequate sequences. The resulting sequence lengths varied from 40–50 frames. Additionally, sequences from US volumes with missing diagnostic features were labeled inadequate.

During training, we used mini-batches of size 32 for 50 epochs and the cost function we minimised was the mean of the binary cross-entropy loss between the output prediction p and the true label vector y, calculated as

$$\begin{aligned} \mathcal {L} (\theta ) = - \frac{1}{n} \sum _{i=1}^{n} \left[ y_i \log \left( p_i\right) + \left( 1 - y_i\right) \log \left( 1 - p_i\right) \right] , \end{aligned}$$

(1)

where i indexes samples and n is the number of samples. We used Adam [20] as our optimizer for minimizing the objective function with a learning rate of 1e–5 and learning rate decay of 1e–6.

3 Results and Discussion

Our collaborating expert radiologist was asked to provide clinical classification labels for 20 test US volumes (new, unseen by our network during training and validation), which we treated as gold standard. In this experiment, we purposely included scans in this test dataset that we expected to be challenging to interpret, for example the volume shown in Fig. 3.

Testing on the sequences from 20 test volumes, our proposed approach achieved an accuracy of 82% and area under receiver operating characteristic curve of 0.83. In order to output a single label for each test volume, we passed sequences of length 50 frames at a time into the network (as in the training) until all 200 frames had been processed. Volumes were labeled as adequate by the network when an adequate sequence was found within a volume. Using this strategy, our network’s output labels agreed with our radiologist’s manual labels for 16 of the 20 challenging test scans. We further compared results with our previous method [15] and found that it correctly classified only 14 of our 20 test volumes. Since that method was based on a slice by slice analysis, it failed to identify any adequate volumes in which there was not a series of slices that each had all the required anatomy. For example, as illustrated in Fig. 3(c) and (d), frame 23 is missing the acetabulum and frame 51 is missing the ischium, so our previous method classified these slices as inadequate. In comparison, our new method analyses the frames collectively as a sequence and correctly classified this volume as adequate since the all the required features are present across the sequence of frames.

Runtime. Leveraging the GPU-based implementation of neural networks by TensorFlow, the trained model was able to perform a classification of an input US volume in one second, a time suitable for clinical workflow. This time was achieved on a Intel(R) Core(TM) i7-7800X 3.50 GHz CPU, with a NVIDIA TITAN Xp GPU and 64 GB of RAM. For comparison, our expert radiologist (experienced in DDH diagnosis with 2D scans) took an average of 10–40 s to classify one volume.

4 Conclusions

We presented a technique for fully automatic scan adequacy assessment of 3D US volumes for DDH. We developed a list of criteria defining the features required for diagnosis, proposed a neural network architecture comprising convolutional recurrent layers for robust classification, and validated our model on real pediatric data. Our volume classification agrees well with an expert’s manual classification with an average processing time of one second, which is suitable for clinical use. Considering the small size of the training data, we expect better performance as our dataset continues to grow with scans from more patients and a variety of US machines. Future work will include expanding the size of our training set and investigating the differences in reliability and task time between novice and experienced sonographers/surgeons using our setup. We expect real time automatic US scan adequacy assessment to have significant clinical impact with the potential to help in imaging standardisation of 3D US for DDH. Currently, there is no universal screening for DDH in North America due to the high cost of experienced personnel needed for scan acquisition. In future, an automatic assessment tool may potentially reduce DDH screening costs by allowing personnel other than highly trained radiologists or surgeons to obtain reliable 3D US scans suitable for diagnosis and thus make universal DDH screening possible.

References

Committee on Quality Improvement, Subcommittee on Developmental Dysplasia of the Hip: Clinical practice guideline: early detection of developmental dysplasia of the hip. Pediatrics 105(4), 896 (2000)
Google Scholar
Hoaglund, F.T., Steinbach, L.S.: Primary osteoarthritis of the hip: etiology and epidemiology. JAAOS 9(5), 320–327 (2001)
Google Scholar
Atweh, L., Kan, J.: Multimodality imaging of developmental dysplasia of the hip. Pediatr. Radiol. 43(1), 166–171 (2013)
Article Google Scholar
Tschauner, C., Matthissen, H.: Hip sonography with Graf-method in newborns: checklists help to avoid mistakes. OUB 1, 7–8 (2012)
Google Scholar
Graf, R., Mohajer, M., Florian, P.: Hip sonography update: quality-management, catastrophes - tips and tricks. Med. Ultrason. J. 15(4), 299–303 (2013)
Article Google Scholar
Jaremko, J., Mabee, M., Swami, V., Jamieson, L., Chow, K., Thompson, R.: Potential for change in US diagnosis of hip dysplasia solely caused by changes in probe orientation: patterns of alpha-angle variation revealed by using three-dimensional US. Radiology 273(3), 870–878 (2014)
Article Google Scholar
Hareendranathan, A., Mabee, M., Punithakumar, K., Noga, M., Jaremko, J.: A technique for semiautomatic segmentation of echogenic structures in 3D ultrasound, applied to infant hip dysplasia. Int. J. Comput. Assist. Radiol. Surg. 11(1), 31–42 (2016)
Article Google Scholar
Quader, N., Hodgson, A., Mulpuri, K., Cooper, A., Abugharbieh, R.: Towards reliable automatic characterization of neonatal hip dysplasia from 3D ultrasound images. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9900, pp. 602–609. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46720-7_70
Chapter Google Scholar
Maraci, M., Bridge, C., Napolitano, R., Papageorghiou, A., Noble, A.: A framework for analysis of linear ultrasound videos to detect fetal presentation and heartbeat. Med. Image Anal. 37, 22–36 (2017)
Article Google Scholar
Rahmatullah, B., Papageorghiou, A., Noble, J.A.: Automated selection of standardized planes from ultrasound volume. In: Suzuki, K., Wang, F., Shen, D., Yan, P. (eds.) MLMI 2011. LNCS, vol. 7009, pp. 35–42. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24319-6_5
Chapter Google Scholar
Chen, H.: Ultrasound standard plane detection using a composite neural network framework. IEEE Trans. Cybern. 47(6), 1576–1586 (2017)
Article Google Scholar
Baumgartner, C.F., Kamnitsas, K., Matthew, J., Smith, S., Kainz, B., Rueckert, D.: Real-time standard scan plane detection and localisation in fetal ultrasound using fully convolutional neural networks. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9901, pp. 203–211. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46723-8_24
Chapter Google Scholar
Abdi, A.H., et al.: Quality assessment of echocardiographic cine using recurrent neural networks: feasibility on five standard view planes. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10435, pp. 302–310. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66179-7_35
Chapter Google Scholar
Quader, N., Hodgson, A.J., Mulpuri, K., Schaeffer, E., Abugharbieh, R.: Automatic evaluation of scan adequacy and dysplasia metrics in 2-D ultrasound images of the neonatal hip. Ultrasound Med. Biol. 43, 1252–1262 (2017)
Article Google Scholar
Paserin, O., Mulpuri, K., Cooper, A., Hodgson, A.J., Abugharbieh, R.: Automatic near real-time evaluation of 3D ultrasound scan adequacy for developmental dysplasia of the hip. In: Cardoso, M.J., et al. (eds.) CARE/CLIP -2017. LNCS, vol. 10550, pp. 124–132. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67543-5_12
Chapter Google Scholar
Chollet, F.: Keras (2015). https://github.com/fchollet/keras
TensorFlow: Large-scale machine learning on heterogeneous systems (2015). tensorflow.org
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations, pp. 1–14 (2015)
Google Scholar
Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations, pp. 1–15 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

BiSICL, University of British Columbia, Vancouver, Canada
Olivia Paserin, Kishore Mulpuri, Anthony Cooper, Antony J. Hodgson & Rafeef Garbi

Authors

Olivia Paserin
View author publications
You can also search for this author in PubMed Google Scholar
Kishore Mulpuri
View author publications
You can also search for this author in PubMed Google Scholar
Anthony Cooper
View author publications
You can also search for this author in PubMed Google Scholar
Antony J. Hodgson
View author publications
You can also search for this author in PubMed Google Scholar
Rafeef Garbi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Olivia Paserin .

Editor information

Editors and Affiliations

University of Leeds, Leeds, UK
Alejandro F. Frangi
King’s College London, London, UK
Julia A. Schnabel
University of Pennsylvania, Philadelphia, PA, USA
Christos Davatzikos
Universidad de Valladolid, Valladolid, Spain
Carlos Alberola-López
Queen’s University, Kingston, ON, Canada
Gabor Fichtinger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Paserin, O., Mulpuri, K., Cooper, A., Hodgson, A.J., Garbi, R. (2018). Real Time RNN Based 3D Ultrasound Scan Adequacy for Developmental Dysplasia of the Hip. In: Frangi, A., Schnabel, J., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2018. MICCAI 2018. Lecture Notes in Computer Science(), vol 11070. Springer, Cham. https://doi.org/10.1007/978-3-030-00928-1_42

Download citation

DOI: https://doi.org/10.1007/978-3-030-00928-1_42
Published: 26 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00927-4
Online ISBN: 978-3-030-00928-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics