Skip to main content

Dimensionality Reduction for Clustering and Cluster Tracking of Cytometry Data

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2019: Text and Time Series (ICANN 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11730))

Included in the following conference series:

Abstract

Mass cytometry is a new high-throughput technology that is becoming a cornerstone in immunology and cell biology research. With technological advancement, the number of cellular characteristics cytometry can simultaneously quantify grows, making analysis increasingly computationally onerous. In this paper, we investigate the potential of dimensionality reduction techniques to ease computational burden in clustering cytometry data whilst minimally diminishing clustering performance. We explore 3 such techniques: Principal Component Analysis (PCA), Autoencoders (AE) and Uniform Manifold Approximation and Projection (UMAP). Thereafter we employ a recent clustering algorithm, ChronoClust, which clusters data at each time-point into cell populations and explicitly tracks them over time. We evaluate this approach through a 14-dimensional cytometry dataset describing the immune response to West Nile Virus over 8 days in mice. To obtain a broad sample of clustering performance, each of the four datasets (unreduced, PCA-, AE- and UMAP-reduced) is independently clustered 400 times, using 400 unique ChronoClust parameter value sets. We find that PCA and AE can reduce the computational expense whilst incurring a minimal degradation in clustering and cluster tracking performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aghaeepour, N., et al.: Critical assessment of automated flow cytometry data analysis techniques. Nat. Meth. 10(3), 228 (2013). https://doi.org/10.1038/nmeth.2365

    Article  Google Scholar 

  2. Ashhurst, T.M., Smith, A.L., King, N.J.C.: High-dimensional fluorescence cytometry. Curr. Protoc. Immunol. 119(1), 5–8 (2017). https://doi.org/10.1002/cpim.37

    Article  Google Scholar 

  3. Becht, E., et al.: Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37(1), 38–44 (2019). https://doi.org/10.1038/nbt.4314

    Article  Google Scholar 

  4. Bendall, S.C., Nolan, G.P., Roederer, M., Chattopadhyay, P.K.: A deep profiler’s guide to cytometry. Trends Immunol. 33(7), 323–332 (2012). https://doi.org/10.1016/j.it.2012.02.010

    Article  Google Scholar 

  5. Bendall, S.C., et al.: Single-cell mass cytometry of differential immune and drug responses across a human hematopoietic continuum. Science 332(6030), 687–696 (2011). https://doi.org/10.1126/science.1198704

    Article  Google Scholar 

  6. Betechuoh, B.L., Marwala, T., Tettey, T.: Autoencoder networks for HIV classification. Curr. Sci. 91(11), 1467–1473 (2006)

    Google Scholar 

  7. Bohm, C., Railing, K., Kriegel, H.P., Kroger, P.: Density connected clustering with local subspace preferences. In: Proceedings of the 4th International Conference on Data Mining, pp. 27–34 (2004). https://doi.org/10.1109/icdm.2004.10087

  8. Chicco, D., Sadowski, P., Baldi, P.: Deep autoencoder neural networks for gene ontology annotation predictions. In: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (2014). https://doi.org/10.1145/2649387.2649442

  9. Ching, T., et al.: Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 15(141), 20170387 (2018). https://doi.org/10.1098/rsif.2017.0387

    Article  Google Scholar 

  10. Coifman, R.R., Lafon, S.: Diffusion maps. Appl. Comput. Harmonic Anal. 21(1), 5–30 (2006). https://doi.org/10.1016/j.acha.2006.04.006

    Article  MathSciNet  MATH  Google Scholar 

  11. Gulshan, V., Peng, L., Coram, M., Stumpe, M.C., Wu, D., Narayanaswamy, A., Venugopalan, S., Widner, K., Madams, T., Cuadros, J., et al.: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. J. Am. Med. Assoc. 316(22), 2402–2410 (2016). https://doi.org/10.1001/jama.2016.17216

    Article  Google Scholar 

  12. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006). https://doi.org/10.1126/science.1127647

    Article  MathSciNet  MATH  Google Scholar 

  13. Hinton, G.E., Zemel, R.S.: Autoencoders, minimum description length and helmholtz free energy. In: Advances in Neural Information Processing Systems (1994)

    Google Scholar 

  14. Jain, A.K., Dubes, R.C., et al.: Algorithms for Clustering Data, vol. 6. Prentice Hall, Englewood Cliffs (1988)

    MATH  Google Scholar 

  15. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  16. McInnes, L., Healy, J., Melville, J.: Umap: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018)

  17. McKay, M., Beckman, R., Canover, W.: A comparison of three methods for selecting values of input variables in the analysis of output from a a computer code. Technometrics 21(2), 239–245 (1979). https://doi.org/10.1080/00401706.1979.10489755

    Article  MathSciNet  MATH  Google Scholar 

  18. Newell, E.W., Sigal, N., Bendall, S.C., Nolan, G.P., Davis, M.M.: Cytometry by time-of-flight shows combinatorial cytokine expression and virus-specific cell niches within a continuum of cd8+ t cell phenotypes. Immunity 36(1), 142–152 (2012). https://doi.org/10.1016/j.immuni.2012.12.002

    Article  Google Scholar 

  19. Putri, G.H., et al.: Chronoclust: density-based clustering and cluster tracking in high-dimensional time-series data. Knowl.-Based Syst. 174, 9–26 (2019). https://doi.org/10.1016/j.knosys.2019.02.018

    Article  Google Scholar 

  20. Ringnér, M.: What is principal component analysis? Nat. Biotechnol. 26(3), 303 (2008). https://doi.org/10.1038/nbt0308-303

    Article  Google Scholar 

  21. Tan, J., Ung, M., Cheng, C., Greene, C.S.: Unsupervised feature construction and knowledge extraction from genome-wide assays of breast cancer with denoising autoencoders. In: Pacific Symposium on Biocomputing, pp. 132–143 (2014). https://doi.org/10.1142/9789814644730_0014

  22. Tan, P., Steinbach, M., Kumar, V.: Cluster analysis: basic concepts and algorithms. In: Introduction to Data Mining, Chap. 11, pp. 487–568. Addison Wesley (2005)

    Google Scholar 

  23. Turner, J.D., et al.: Th2 cytokines are associated with reduced worm burdens in a human intestinal helminth infection. J. Infect. Dis. 188(11), 1768–1775 (2003). https://doi.org/10.1086/379370

    Article  Google Scholar 

  24. Van Der Maaten, L., Hinton, G.: Visualizing data using t-sne. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)

    MATH  Google Scholar 

  25. Van Der Maaten, L., Postma, E., Van den Herik, J.: Dimensionality reduction: a comparative review. J. Mach. Learn. Res. 10, 66–71 (2009)

    Google Scholar 

  26. Yeung, K.Y., Ruzzo, W.L.: Principal component analysis for clustering gene expression data. Bioinformatics 17(9), 763–774 (2001). https://doi.org/10.1093/bioinformatics/17.9.763

    Article  Google Scholar 

Download references

Acknowledgements

WNV data generation was supported by the Australian Research Council (grants LE140100149, DP160102063) and the Australian National Health and Medical Research Council (grant 1088242). All procedures involving mice were reviewed and approved by the University of Sydney AEC. We thank the Sydney Informatics Hub at the University of Sydney for access to their High-Performance Computing facility.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Givanna H. Putri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Putri, G.H., Read, M.N., Koprinska, I., Ashhurst, T.M., King, N.J.C. (2019). Dimensionality Reduction for Clustering and Cluster Tracking of Cytometry Data. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Text and Time Series. ICANN 2019. Lecture Notes in Computer Science(), vol 11730. Springer, Cham. https://doi.org/10.1007/978-3-030-30490-4_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30490-4_50

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30489-8

  • Online ISBN: 978-3-030-30490-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics