Skip to main content

A Comparison of Unsupervised Abnormality Detection Methods for Interstitial Lung Disease

  • Conference paper
  • First Online:
Book cover Medical Image Understanding and Analysis (MIUA 2018)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 894))

Included in the following conference series:

Abstract

Abnormality detection, also known as outlier detection or novelty detection, seeks to identify data that do not match an expected distribution. In medical imaging, this could be used to find data samples with possible pathology or, more generally, to exclude samples that are normal. This may be done by learning a model of normality, against which new samples are evaluated. In this paper four methods, each representing a different family of techniques, are compared: one-class support vector machine, isolation forest, local outlier factor, and fast-minimum covariance determinant estimator. Each method is evaluated on patches of CT interstitial lung disease where the patches are encoded with one of four embedding methods: principal component analysis, kernel principal component analysis, a flat autoencoder, and a convolutional autoencoder. The data consists of 5500 healthy patches from one patient cohort defining normality, and 2970 patches from a second patient cohort with emphysema, fibrosis, ground glass opacity, and micronodule pathology representing abnormality. From this second cohort 1030 healthy patches are used as an evaluation dataset. Evaluation occurs in both the accuracy (area under the ROC curve) and runtime efficiency. The fast-minimum covariance determinant estimator is demonstrated to have a fair time scaling with dataset dimensionality, while the isolation forest and one-class support vector machine scale well with dimensionality. The one-class support vector machine is the most accurate, closely followed by the isolation forest and fast-minimum covariance determinant estimator. The embeddings from kernel principal component analysis are the most generally useful.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note: The predict times do no include the time taken to run the embedding method on the data being predicted on.

References

  1. Barbará, D., Li, Y., Couto, J., Lin, J.-L., Jajodia, S.: Bootstrapping a data mining intrusion detection system. In: Proceedings of the 2003 ACM Symposium on Applied Computing, pp. 421–425. ACM (2003)

    Google Scholar 

  2. Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: ACM SIGMOD Record, vol. 29, pp. 93–104. ACM (2000)

    Google Scholar 

  3. Chollet, F., et al.: Keras (2015). https://github.com/keras-team/keras

  4. Depeursinge, A., Vargas, A., Platon, A., Geissbuhler, A., Poletti, P.-A., Müller, H.: Building a reference multimedia database for interstitial lung diseases. Comput. Med. Imag. Graph. 36(3), 227–238 (2012)

    Article  Google Scholar 

  5. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(Jul), 2121–2159 (2011)

    MathSciNet  MATH  Google Scholar 

  6. Ertöz, L., Steinbach, M., Kumar, V.: Finding topics in collections of documents: a shared nearest neighbor approach. In: Wu, W., Xiong, H., Shekhar, S. (eds.) Clustering and Information Retrieval, pp. 83–103. Springer, Boston (2004). https://doi.org/10.1007/978-1-4613-0227-8_3

    Chapter  MATH  Google Scholar 

  7. Ester, M., Kriegel, H.-P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996)

    Google Scholar 

  8. He, Z., Xiaofei, X., Deng, S.: Discovering cluster-based local outliers. Pattern Recogn. Lett. 24(9–10), 1641–1650 (2003)

    Article  Google Scholar 

  9. Jolliffe, I.T.: Principal component analysis and factor analysis. In: Jolliffe, I.T. (ed.) Principal Component Analysis, pp. 115–128. Springer, New York (1986). https://doi.org/10.1007/978-1-4757-1904-8_7

    Chapter  Google Scholar 

  10. Kohonen, T., Schroeder, M.R., Huang, T.S.: Self-Organizing Map, p. 2. Springer, Secaucus (2001)

    Book  Google Scholar 

  11. Liu, F.T., Ting, K.M., Zhou, Z.-H.: Isolation forest. In: 2008 Eighth IEEE International Conference on Data Mining. ICDM 2008, pp. 413–422. IEEE (2008)

    Google Scholar 

  12. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  13. Radiopaedia: Windowing (ct), (2018). https://radiopaedia.org/articles/windowing-ct

  14. Rousseeuw, P.J., Van Driessen, K.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41(3), 212–223 (1999)

    Article  Google Scholar 

  15. Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)

    Article  Google Scholar 

  16. Schölkopf, B., Smola, A., Müller, K.-R.: Kernel principal component analysis. In: Gerstner, W., Germond, A., Hasler, M., Nicoud, J.-D. (eds.) ICANN 1997. LNCS, vol. 1327, pp. 583–588. Springer, Heidelberg (1997). https://doi.org/10.1007/BFb0020217

    Chapter  Google Scholar 

  17. Sorensen, L., Shaker, S.B., De Bruijne, M.: Quantitative analysis of pulmonary emphysema using local binary patterns. IEEE Trans. Med. Imag. 29(2), 559–569 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matt Daykin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Daykin, M., Sellathurai, M., Poole, I. (2018). A Comparison of Unsupervised Abnormality Detection Methods for Interstitial Lung Disease. In: Nixon, M., Mahmoodi, S., Zwiggelaar, R. (eds) Medical Image Understanding and Analysis. MIUA 2018. Communications in Computer and Information Science, vol 894. Springer, Cham. https://doi.org/10.1007/978-3-319-95921-4_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-95921-4_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-95920-7

  • Online ISBN: 978-3-319-95921-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics