Abstract
Mass cytometry is a new high-throughput technology that is becoming a cornerstone in immunology and cell biology research. With technological advancement, the number of cellular characteristics cytometry can simultaneously quantify grows, making analysis increasingly computationally onerous. In this paper, we investigate the potential of dimensionality reduction techniques to ease computational burden in clustering cytometry data whilst minimally diminishing clustering performance. We explore 3 such techniques: Principal Component Analysis (PCA), Autoencoders (AE) and Uniform Manifold Approximation and Projection (UMAP). Thereafter we employ a recent clustering algorithm, ChronoClust, which clusters data at each time-point into cell populations and explicitly tracks them over time. We evaluate this approach through a 14-dimensional cytometry dataset describing the immune response to West Nile Virus over 8 days in mice. To obtain a broad sample of clustering performance, each of the four datasets (unreduced, PCA-, AE- and UMAP-reduced) is independently clustered 400 times, using 400 unique ChronoClust parameter value sets. We find that PCA and AE can reduce the computational expense whilst incurring a minimal degradation in clustering and cluster tracking performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aghaeepour, N., et al.: Critical assessment of automated flow cytometry data analysis techniques. Nat. Meth. 10(3), 228 (2013). https://doi.org/10.1038/nmeth.2365
Ashhurst, T.M., Smith, A.L., King, N.J.C.: High-dimensional fluorescence cytometry. Curr. Protoc. Immunol. 119(1), 5–8 (2017). https://doi.org/10.1002/cpim.37
Becht, E., et al.: Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37(1), 38–44 (2019). https://doi.org/10.1038/nbt.4314
Bendall, S.C., Nolan, G.P., Roederer, M., Chattopadhyay, P.K.: A deep profiler’s guide to cytometry. Trends Immunol. 33(7), 323–332 (2012). https://doi.org/10.1016/j.it.2012.02.010
Bendall, S.C., et al.: Single-cell mass cytometry of differential immune and drug responses across a human hematopoietic continuum. Science 332(6030), 687–696 (2011). https://doi.org/10.1126/science.1198704
Betechuoh, B.L., Marwala, T., Tettey, T.: Autoencoder networks for HIV classification. Curr. Sci. 91(11), 1467–1473 (2006)
Bohm, C., Railing, K., Kriegel, H.P., Kroger, P.: Density connected clustering with local subspace preferences. In: Proceedings of the 4th International Conference on Data Mining, pp. 27–34 (2004). https://doi.org/10.1109/icdm.2004.10087
Chicco, D., Sadowski, P., Baldi, P.: Deep autoencoder neural networks for gene ontology annotation predictions. In: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (2014). https://doi.org/10.1145/2649387.2649442
Ching, T., et al.: Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 15(141), 20170387 (2018). https://doi.org/10.1098/rsif.2017.0387
Coifman, R.R., Lafon, S.: Diffusion maps. Appl. Comput. Harmonic Anal. 21(1), 5–30 (2006). https://doi.org/10.1016/j.acha.2006.04.006
Gulshan, V., Peng, L., Coram, M., Stumpe, M.C., Wu, D., Narayanaswamy, A., Venugopalan, S., Widner, K., Madams, T., Cuadros, J., et al.: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. J. Am. Med. Assoc. 316(22), 2402–2410 (2016). https://doi.org/10.1001/jama.2016.17216
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006). https://doi.org/10.1126/science.1127647
Hinton, G.E., Zemel, R.S.: Autoencoders, minimum description length and helmholtz free energy. In: Advances in Neural Information Processing Systems (1994)
Jain, A.K., Dubes, R.C., et al.: Algorithms for Clustering Data, vol. 6. Prentice Hall, Englewood Cliffs (1988)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
McInnes, L., Healy, J., Melville, J.: Umap: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018)
McKay, M., Beckman, R., Canover, W.: A comparison of three methods for selecting values of input variables in the analysis of output from a a computer code. Technometrics 21(2), 239–245 (1979). https://doi.org/10.1080/00401706.1979.10489755
Newell, E.W., Sigal, N., Bendall, S.C., Nolan, G.P., Davis, M.M.: Cytometry by time-of-flight shows combinatorial cytokine expression and virus-specific cell niches within a continuum of cd8+ t cell phenotypes. Immunity 36(1), 142–152 (2012). https://doi.org/10.1016/j.immuni.2012.12.002
Putri, G.H., et al.: Chronoclust: density-based clustering and cluster tracking in high-dimensional time-series data. Knowl.-Based Syst. 174, 9–26 (2019). https://doi.org/10.1016/j.knosys.2019.02.018
Ringnér, M.: What is principal component analysis? Nat. Biotechnol. 26(3), 303 (2008). https://doi.org/10.1038/nbt0308-303
Tan, J., Ung, M., Cheng, C., Greene, C.S.: Unsupervised feature construction and knowledge extraction from genome-wide assays of breast cancer with denoising autoencoders. In: Pacific Symposium on Biocomputing, pp. 132–143 (2014). https://doi.org/10.1142/9789814644730_0014
Tan, P., Steinbach, M., Kumar, V.: Cluster analysis: basic concepts and algorithms. In: Introduction to Data Mining, Chap. 11, pp. 487–568. Addison Wesley (2005)
Turner, J.D., et al.: Th2 cytokines are associated with reduced worm burdens in a human intestinal helminth infection. J. Infect. Dis. 188(11), 1768–1775 (2003). https://doi.org/10.1086/379370
Van Der Maaten, L., Hinton, G.: Visualizing data using t-sne. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)
Van Der Maaten, L., Postma, E., Van den Herik, J.: Dimensionality reduction: a comparative review. J. Mach. Learn. Res. 10, 66–71 (2009)
Yeung, K.Y., Ruzzo, W.L.: Principal component analysis for clustering gene expression data. Bioinformatics 17(9), 763–774 (2001). https://doi.org/10.1093/bioinformatics/17.9.763
Acknowledgements
WNV data generation was supported by the Australian Research Council (grants LE140100149, DP160102063) and the Australian National Health and Medical Research Council (grant 1088242). All procedures involving mice were reviewed and approved by the University of Sydney AEC. We thank the Sydney Informatics Hub at the University of Sydney for access to their High-Performance Computing facility.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Putri, G.H., Read, M.N., Koprinska, I., Ashhurst, T.M., King, N.J.C. (2019). Dimensionality Reduction for Clustering and Cluster Tracking of Cytometry Data. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Text and Time Series. ICANN 2019. Lecture Notes in Computer Science(), vol 11730. Springer, Cham. https://doi.org/10.1007/978-3-030-30490-4_50
Download citation
DOI: https://doi.org/10.1007/978-3-030-30490-4_50
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30489-8
Online ISBN: 978-3-030-30490-4
eBook Packages: Computer ScienceComputer Science (R0)