Automatic Classification of Centrilobular Emphysema on CT Using Deep Learning: Comparison with Visual Scoring

Humphries, Stephen M.; Notary, Aleena M.; Centeno, Juan Pablo; Lynch, David A.

doi:10.1007/978-3-030-00946-5_32

Stephen M. Humphries⁴²,
Aleena M. Notary⁴²,
Juan Pablo Centeno⁴² &
…
David A. Lynch⁴²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11040))

Included in the following conference series:

2535 Accesses
3 Citations

Abstract

The presence and severity of emphysema, scored visually on computed tomography (CT) using a classification system developed by the Fleischner Society, is a clinically significant index of disease severity. Since visual assessment can be subjective and is time consuming, our purpose was to evaluate the potential of a deep learning method for automatic grading of emphysema. The study cohort included 8213 subjects enrolled in the COPDGene study. Baseline CT and visual scores on 2500 subjects were used to train a deep learning model for classification of centrilobular emphysema according to the Fleischner system. The model was then used to predict emphysema scores on 5713 subjects not included in the training set. Predictions were compared with visual emphysema scores, pulmonary function tests (PFTs), smoking history and St. George Respiratory Questionnaire (SGRQ). Agreement between visual emphysema scores and those generated automatically was moderate (weighted \(\kappa \) = 0.60, p < 0.0001). Emphysema scores predicted by the deep learning model showed significant associations with PFTs, smoking history and SGRQ, similar to those seen in comparison with visual scores.

This work was supported by grants R01HL089897 and R01HL089856 from the National Heart, Lung, and Blood Institute. The Genetic Epidemiology of COPD (COPDGene) project is also supported by the COPD Foundation through contributions made to an industry advisory board representing AstraZeneca, Boehringer Ingelheim, Novartis, Pfizer, Siemens, Sunovion, and GlaxoSmithKline.

You have full access to this open access chapter, Download conference paper PDF

Emphysema subtyping on thoracic computed tomography scans using deep neural networks

Article Open access 29 August 2023

Weiyi Xie, Colin Jacobs, … Bram van Ginneken

Multi-scale Deep Convolutional Neural Networks for Emphysema Classification and Quantification

Machine learning slice-wise whole-lung CT emphysema score correlates with airway obstruction

Article Open access 08 August 2023

Mats Lidén, Antoine Spahr, … Per Thunberg

Keywords

1 Introduction

Chronic obstructive pulmonary disease (COPD), the third leading cause of death in the U.S., is a heterogeneous group of lung disorders, including a range of patterns of emphysema, chronic bronchitis, and non-emphysematous obstruction due to small-airway disease [7]. CT plays an important role in assessment of COPD. While quantitative image analysis techniques such as lung densitometry are able to detect and quantify emphysema [13], visual assessment has remained necessary for complete characterization of emphysema patterns [7].

To facilitate data comparison in research and clinical trials, and to improve diagnostic accuracy, the Fleischner Society developed a structured system for visual classification of phenotypic abnormalities on CT in subjects with COPD [6]. The system provides a six point ordinal scale (absent, trace, mild, moderate, confluent, and advanced destructive) for grading the severity of centrilobular emphysema, the prototypical form of emphysema seen in cigarette smokers. This approach was applied in the COPDGene study where baseline CT scans were retrospectively visually scored by trained analysts. Visual scores for presence and severity of centrilobular emphysema were found to be significantly associated with genetic loci previously associated with COPD [5] and with increased mortality risk [7]. Visual assessment required substantial effort, however, with multiple analysts working nearly four years to score approximately ten thousand scans.

The purpose of this study was to assess the feasibility of a deep learning model for automatic emphysema classification on CT. We hypothesized that the Fleischner system, a structured scale developed by domain experts, would provide an effective supervisory signal for training. To develop a model capable of predicting subject-level scores from volumetric CT, which are difficult to process at full resolution due to memory constraints of graphics processing units (GPUs), we combined a convolutional neural network (CNN) architecture with a long short-term memory (LSTM) layer. LSTMs are a type of recurrent neural network capable of learning dependencies in sequence data. They have been used with some success to classify data such as frame sequences from video clips [1]. This enabled efficient prediction using features extracted from axial image sequences sampled over a lung volume. We tested the approach by comparing predictions with visual scores and clinical parameters including pulmonary function tests and Global initiative for Obstructive Lung Disease (GOLD) stage.

2 Methods

2.1 Study Population

COPDGene is a prospective and multicenter investigation focused on the genetic epidemiology of COPD (ClinicalTrials. gov: NCT00608764). All subjects underwent volumetric inspiratory and expiratory CT using a standardized protocol [9]. Inspiratory CT on 8213 subjects were included in this work. The cohort was partitioned so that scans and visual scores on 2500 subjects were used for training and data on the remaining 5713 subjects were reserved for testing. Distribution of emphysema scores in training data was n = 774, 435, 473, 431, 275, 112 for Fleischner scores 0, 1, 2, 3, 4, and 5, respectively. Additional variables including GOLD stage, Forced Expiratory Volume in the first second percent predicted (FEV1\({\%}\)pred), FEV1/Forced Vital Capacity ratio (FEV1/FVC), St. George Respiratory Questionnaire (SGRQ) and smoking history (pack-years) were retrieved from the study database for the test partition. The SGRQ is a respiratory health-related quality of life questionnaire where higher scores correspond to greater impairment [15]. Quantitative CT emphysema score calculated as the percentage of lung voxels with intensity less than −950 Hounsfield Units (LAA-950) was also used for comparison in test subjects.

2.2 Visual Scoring

Each CT study had been retrospectively visually scored by two trained analysts using the scale 0 = absent, 1 = trace, 2 = mild, 3 = moderate, 4 = confluent, and 5 = advanced destructive [7]. Analyst agreement was assessed periodically throughout the study as good to excellent (weighted \(\kappa \) range 0.71–0.80). Analyst discordances larger than one point in score were adjudicated by a thoracic radiologist. Mean analyst score rounded down to the nearest integer was recorded as the final value.

2.3 Classification Algorithm

In an initial process, segmentation was performed on all CTs using an automatic lung segmentation function included in an open source library [11, 12]. Using Python and PyTorch [8] we implemented a combined CNN-LSTM network architecture [1] designed to predict visual centrilobular emphysema score from a sequence of 25 axial images. Axial images were sampled at evenly-spaced intervals over the height of each lung segmentation volume, excluding the upper and lowermost 5 mm. Figure 1 represents the architecture of the CNN-LSTM model. Briefly, the components of the network include four blocks consisting of two dimensional (2D) convolutions, rectified linear unit (RELU) activation and max pooling. The four 2D convolutional layers have 32 6\(\,\times \,\)6, 96 3\(\,\times \,\)3, 256 3\(\,\times \,\)3 and 384 3\(\,\times \,\)3 filters, respectively. The first two max pooling layers have stride 3 and the second two max pooling layers have stride 4. The input images are fed through the CNN portion separately to extract features, which are concatenated into a sequence and passed to the LSTM layer followed by a dense layer before output. The loss function is negative log likelihood.

The model was trained using CT scans and visual centrilobular emphysema scores, expressed as integers 0–5, on 2500 subjects. Some data augmentation was used, including in-plane image translations and offsets in sampling of axial images.

2.4 Statistical Analysis

Weighted \(\kappa \) statistics between analysts visual scores and automatic scores were computed. Descriptive statistics between CNN-LSTM emphysema scores and demographic and clinical parameters were computed. One-way analysis of variance (ANOVA) was used to test for significant differences in FEV1\({\%}\)pred., FEV1/FVC, SGRQ, LAA-950 and smoking history values stratified by emphysema severity scores. Chi-square tests of independence were used to compare centrilobular emphysema scores with GOLD stage. Statistical calculations were performed using R version 3.4.4 (2018-03-15). A p-value of <0.05 was considered statistically significant.

3 Results

Agreement between visual emphysema scores and those generated automatically by the CNN-LSTM was moderate (weighted \(\kappa \) = 0.60, p < 0.0001). Table 1 shows a confusion matrix comparing visual and automatic emphysema scores.

Table 2 shows comparison between visual emphysema score and clinical variables. ANOVA showed that more severe visual emphysema scores were significantly associated with diminished FEV1\(\%\)pred and FEV1/FVC as well as increased SGRQ, LAA-950 and smoking pack-years (p < 0.0001). Chi-square test of independence examining the relationship between visual score and GOLD stage was significant, \(\chi ^2\) (df = 25, n = 5713) = 2716.4, p\(\,<\,\)0.0001. Similarly, Table 3 shows comparison between CNN-LSTM emphysema scores and clinical variables. ANOVA shows that more severe emphysema grades predicted by the CNN-LSTM were associated with more severe clinical measures. Chi-square test of independence examining the relationship between CNN-LTSM score and GOLD stage was also significant, \(\chi ^2\) (df = 25, n = 5713) = 3203.4, p < 0.0001.

Table 1. Confusion matrix comparing visual emphysema scores with those predicted by the CNN-LSTM. Fleischner centrilobular emphysema scores are: 0 = absent, 1 = trace, 2 = mild centrilobular, 3 = moderate centrilobular, 4 = confluent, 5 = advanced destructive.

Full size table

Table 2. Clinical variables according to visual emphysema score. FEV1\(\%\)pred, FEV1/FVC, SGRQ, LAA-950 and pack-years are means (s.d.). GOLD data are number of subjects. Preserved Ratio Impaired Spirometry (PRISm) is defined as FEV1/FVC ratio \(\ge \) 0.7.

Full size table

Table 3. Clinical variables according to CNN-LSTM emphysema score. FEV1\(\%\)pred, FEV1/FVC, SGRQ, LAA-950 and pack-years are means (s.d.). GOLD data are number of subjects.

Full size table

4 Discussion

The Fleischner Society visual scoring system is a clinically significant index of COPD severity that is associated with mortality risk. We have developed a CNN-LSTM model that is capable of automatic classification of centrilobular emphysema pattern on CT according to the Fleischner scale. In a group of 5713 test subjects, automatically scored emphysema patterns showed moderate agreement with visual scores and significant associations with GOLD stage, pulmonary function, SGRQ, LAA-950 and smoking pack-years. While agreement between visual and automatic emphysema scores in this test set is somewhat modest, similar statistical associations are seen when comparing visual and CNN-LSTM emphysema scores with other clinical parameters, suggesting that the CNN-LSTM learns to classify clinically significant features.

Deep learning has become the dominant approach in medical image analysis [16] for applications ranging from segmentation [10] and detection [14] to diagnosis [2]. One challenge when designing CNN architectures is to manage the total number of model parameters considering memory constraints of available GPUs. This is generally the case in applications using volumetric chest CT. Other researchers have shown that limiting model input to a small number (1–4) of relevant “canonical” slices, selected in pre-processing using anatomy detection methods, can be effective [3, 4]. In the present study we instead used an LSTM architecture, which made it possible to limit the number of convolutional weights while still using 25 full-resolution axial images as input.

This study has some limitations. COPDGene uses a well-defined CT protocol and study images are very consistent. Performing both training and testing on images from this cohort may produce optimistic results. We also did not perform systematic comparisons of different CNN architectures. Future work will incorporate mortality and longitudinal assessment for further clinical validation.

5 Conclusions

A combined CNN-LSTM architecture provides an efficient model for subject-level prediction and can be trained to perform automatic classification of centrilobular emphysema on CT.

References

Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2625–2634 (2015)
Google Scholar
Esteva, A., et al.: Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639), 115 (2017)
Article Google Scholar
González, G., et al.: Disease staging and prognosis in smokers using deep learning in chest computed tomography. Am. J. Respir. Crit. Care Med. 197(2), 193–203 (2018)
Article Google Scholar
González, G., Washko, G.R., San José Estépar, R.a.: Deep learning for biomarker regression: application to osteoporosis and emphysema on chest CT scans. In: SPIE Medical Imaging, vol. 10574 (2018)
Google Scholar
Halper-Stromberg, E., et al.: Visual assessment of chest computed tomographic images is independently useful for genetic association analysis in studies of chronic obstructive pulmonary disease. Ann. Am. Thorac. Soc. 14(1), 33–40 (2017)
Article Google Scholar
Lynch, D.A., et al.: CT-definable subtypes of chronic obstructive pulmonary disease: a statement of the fleischner society. Radiology 277(1), 192–205 (2015)
Article Google Scholar
Lynch, D.A.: CT-based visual classification of emphysema: association with mortality in the COPDGene study. Radiology 288, 859–866 (2018)
Article Google Scholar
Paszke, A., et al.: Automatic Differentiation in PyTorch (2017)
Google Scholar
Regan, E.A., et al.: Genetic epidemiology of COPD (COPDGene) study design. COPD J. Chronic Obstructive Pulm. Dis. 7(1), 32–43 (2011)
Article Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Ross, J.C., et al.: Lung extraction, lobe segmentation and hierarchical region assessment for quantitative analysis on high resolution computed tomography images. In: Yang, G.-Z., Hawkes, D., Rueckert, D., Noble, A., Taylor, C. (eds.) MICCAI 2009. LNCS, vol. 5762, pp. 690–698. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04271-3_84
Chapter Google Scholar
Ross, J., Harmouche, R., Onieva, J., Diaz, A., Washko, G., Estepar, R.S.J.: Chest imaging platform: an open-source library and workstation for quantitative chest imaging. Am. J. Respir. Crit. Care Med. 191, A4975 (2015)
Google Scholar
Schroeder, J.D., et al.: Relationships between airflow obstruction and quantitative CT measurements of emphysema, air trapping, and airways in subjects with and without chronic obstructive pulmonary disease. Am. J. Roentgenol. 201(3), W460–W470 (2013)
Article Google Scholar
Setio, A.A.A., et al.: Pulmonary nodule detection in CT images: false positive reduction using multi-view convolutional networks. IEEE Trans. Med. Imaging 35(5), 1160–1169 (2016)
Article Google Scholar
Vestbo, J.: Evaluation of COPD longitudinally to identify predictive surrogate endpoints (ECLIPSE). Eur. Respir. J. 31, 869–873 (2008)
Article Google Scholar
Zhou, S.K., Greenspan, H., Shen, D.: Deep Learning for Medical Image Analysis. Academic Press, San Diego (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

National Jewish Health, Denver, CO, 80206, USA
Stephen M. Humphries, Aleena M. Notary, Juan Pablo Centeno & David A. Lynch

Authors

Stephen M. Humphries
View author publications
You can also search for this author in PubMed Google Scholar
Aleena M. Notary
View author publications
You can also search for this author in PubMed Google Scholar
Juan Pablo Centeno
View author publications
You can also search for this author in PubMed Google Scholar
David A. Lynch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stephen M. Humphries .

Editor information

Editors and Affiliations

University College London, London, UK
Danail Stoyanov
University of Leeds, Leeds, UK
Zeike Taylor
Imperial College London, London, UK
Bernhard Kainz
University of Adelaide, Adelaide, SA, Australia
Gabriel Maicas
University of Iowa, Iowa City, IA, USA
Reinhard R. Beichel
Sunnybrook Health Science Centre, Toronto, ON, Canada
Anne Martel
Deutsches Krebsforschungszentrum (DKFZ), Heidelberg, Germany
Lena Maier-Hein
Visulytix Ltd. Screenworks, London, UK
Kanwal Bhatia
King’s College London, London, UK
Tom Vercauteren
Imperial College London, London, UK
Ozan Oktay
University of Adelaide, Adelaide, SA, Australia
Gustavo Carneiro
Queensland University of Technology, Brisbane, QLD, Australia
Andrew P. Bradley
University of Lisbon, Lisbon, Portugal
Jacinto Nascimento
University of Queensland, Brisbane, QLD, Australia
Hang Min
University of California Los Angeles, Los Angeles, CA, USA
Matthew S. Brown
Radboud University Medical Center, Nijmegen, The Netherlands
Colin Jacobs
Fraunhofer Institute for Medical Image Computing (MEVIS), Bremen, Germany
Bianca Lassen-Schmidt
Nagoya University, Nagoya, Japan
Kensaku Mori
University of Copenhagen, Copenhagen, Denmark
Jens Petersen
Harvard Medical School, Boston, MA, USA
Raúl San José Estépar
Philips (Germany), Hamburg, Germany
Alexander Schmidt-Richberg
University College London, London, UK
Catarina Veiga

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Humphries, S.M., Notary, A.M., Centeno, J.P., Lynch, D.A. (2018). Automatic Classification of Centrilobular Emphysema on CT Using Deep Learning: Comparison with Visual Scoring. In: Stoyanov, D., et al. Image Analysis for Moving Organ, Breast, and Thoracic Images. RAMBO BIA TIA 2018 2018 2018. Lecture Notes in Computer Science(), vol 11040. Springer, Cham. https://doi.org/10.1007/978-3-030-00946-5_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-00946-5_32
Published: 12 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00945-8
Online ISBN: 978-3-030-00946-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatic Classification of Centrilobular Emphysema on CT Using Deep Learning: Comparison with Visual Scoring

Abstract

Similar content being viewed by others

Emphysema subtyping on thoracic computed tomography scans using deep neural networks

Multi-scale Deep Convolutional Neural Networks for Emphysema Classification and Quantification

Machine learning slice-wise whole-lung CT emphysema score correlates with airway obstruction

Keywords

1 Introduction

2 Methods

2.1 Study Population

2.2 Visual Scoring

2.3 Classification Algorithm

2.4 Statistical Analysis

3 Results

4 Discussion

5 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Automatic Classification of Centrilobular Emphysema on CT Using Deep Learning: Comparison with Visual Scoring

Abstract

Similar content being viewed by others

Emphysema subtyping on thoracic computed tomography scans using deep neural networks

Multi-scale Deep Convolutional Neural Networks for Emphysema Classification and Quantification

Machine learning slice-wise whole-lung CT emphysema score correlates with airway obstruction

Keywords

1 Introduction

2 Methods

2.1 Study Population

2.2 Visual Scoring

2.3 Classification Algorithm

2.4 Statistical Analysis

3 Results

4 Discussion

5 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation