CytoFA: Automated Gating of Mass Cytometry Data via Robust Skew Factor Analzyers

Lee, Sharon X.

doi:10.1007/978-3-030-16148-4_40

Sharon X. Lee¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11439))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2728 Accesses

Abstract

Cytometry plays an important role in clinical diagnosis and monitoring of lymphomas, leukaemia, and AIDS. However, analysis of modern-day cytometric data is challenging. Besides its high-throughput nature and high dimensionality, these data typically exhibit complex characteristics such as multimodality, asymmetry, heavy-tailness and other non-normal characteristics. This paper presents cytoFA, a novel data mining approach capable of clustering and performing dimensionality reduction of high-dimensional cytometry data. Our approach is also robust against non-normal features including heterogeneity, skewness, and outliers (dead cells) that are typical in flow and mass cytometry data. Based on a statistical approach with well-studied properties, cytoFA adopts a mixtures of factor analyzers (MFA) to learn latent nonlinear low-dimensional representations of the data and to provide an automatic segmentation of the data into its comprising cell populations. We also introduce a double trimming approach to help identify atypical observations and to reduce computation time. The effectiveness of our approach is demonstrated on two large mass cytometry data, outperforming existing benchmark algorithms. We note that while the approach is motivated by cytometric data analysis, it is applicable and useful for modelling data from other fields.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bendall, S.C., Simonds, E.F., Qiu, P., Amir, E.D., Krutzik, P.O., Finck, R.: Single-cell mass cytometry of differential immune and drug responses across a human hematopoietic continuum. Science 332, 687–696 (2011)
Article Google Scholar
Aghaeepour, N., et al.: Critical assessment of automated flow cytometry analysis techniques. Nat. Methods 10, 228–238 (2013)
Article Google Scholar
Saeys, Y., Van Gassen, S., Lambrecht, B.N.: Computational flow cytometry: helping to make sense of high-dimensional immunology data. Nat. Rev. Immunol. 16, 449–462 (2016)
Article Google Scholar
Weber, L.M., Robinson, M.D.: Comparison of clustering methods for high-dimensional single-cell flow and mass cytometry data. Cytom. A 89, 1084–1096 (2016)
Article Google Scholar
Pyne, S., et al.: Automated high-dimensional flow cytometric data analysis. Proc. Natl. Acad. Sci. USA 106, 8519–8524 (2009)
Article Google Scholar
Pyne, S., et al.: Joint modeling and registration of cell populations in cohorts of high-dimensional flow cytometric data. PloS One 9, e100334 (2014)
Article Google Scholar
Wang, K., Ng, S.K., McLachlan, G.J.: Multivariate skew \(t\) mixture models: applications to fluorescence-activated cell sorting data. In: Shi, H., Zhang, Y., Bottema, M.J., Lovell, B.C., Maeder, A.J. (eds.) Proceedings of Conference of Digital Image Computing: Techniques and Applications, Los Alamitos, California, pp. 526–531. IEEE (2009)
Google Scholar
Frühwirth-Schnatter, S., Pyne, S.: Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-\(t\) distributions. Biostatistics 11, 317–336 (2010)
Article Google Scholar
Lee, S.X., McLachlan, G.J.: Model-based clustering and classification with non-normal mixture distributions. Stat. Methods Appl. 22, 427–454 (2013)
Article MathSciNet MATH Google Scholar
Lee, S.X., McLachlan, G.J.: Finite mixtures of canonical fundamental skew \(t\)-distributions: the unification of the restricted and unrestricted skew \(t\)-mixture models. Stat. Comput. 26, 573–589 (2016)
Article MathSciNet MATH Google Scholar
Lee, S.X., McLachlan, G.J., Pyne, S.: Modelling of inter-sample variation in flow cytometric data with the joint clustering and matching (JCM) procedure. Cytom. A 89, 30–43 (2016)
Article Google Scholar
Pyne, S., Lee, S., McLachlan, G.: Nature and man: the goal of bio-security in the course of rapid and inevitable human development. J. Indian Soc. Agric. Stat. 69, 117–125 (2015)
MathSciNet Google Scholar
Rossin, E., Lin, T.I., Ho, H.J., Mentzer, S.J., Pyne, S.: A framework for analytical characterization of monoclonal antibodies based on reactivity profiles in different tissues. Bioinformatics 27, 2746–2753 (2011)
Article Google Scholar
Lee, S.X., McLachlan, G., Pyne, S.: Application of mixture models to large datasets. In: Pyne, S., Rao, B.L.S.P., Rao, S.B. (eds.) Big Data Analytics, pp. 57–74. Springer, New Delhi (2016). https://doi.org/10.1007/978-81-322-3628-3_4
Chapter Google Scholar
Bouveyron, C., Brunet-Saumard, C.: Model-based clustering of high-dimensional data: a review. Comput. Stat. Data Anal. 71, 52–78 (2014)
Article MathSciNet MATH Google Scholar
Becher, B., et al.: High-dimensional analysis of the murine myeloid cell system. Nat. Immunol. 15, 1181–1189 (2014)
Article Google Scholar
Azzalini, A., Dalla Valle, A.: The multivariate skew-normal distribution. Biometrika 83, 715–726 (1996)
Article MathSciNet MATH Google Scholar
McLachlan, G.J., Lee, S.X.: Comment on “on nomenclature for, and the relative merits of, two formulations of skew distributions” by A. Azzalini, R. Browne, M. Genton, and P. McNicholas. Stat. Probab. Lett. 116, 1–5 (2016)
Article MathSciNet MATH Google Scholar
Lee, S.X., McLachlan, G.J.: On mixtures of skew-normal and skew \(t\)-distributions. Adv. Data Anal. Classif. 7, 241–266 (2013)
Article MathSciNet MATH Google Scholar
Ghahramani, Z., Beal, M.: Variational inference for Bayesian mixture of factor analysers. In: Solla, S., Leen, T., Muller, K.R. (eds.) Advances in Neural Information Processing Systems, pp. 449–455. MIT Press, Cambridge (2000)
Google Scholar
McLachlan, G.J., Peel, D.: Mixtures of factor analyzers. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 599–606. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Neykov, N., Filzmoser, P., Dimova, R., Neytchev, P.: Robust fitting of mixtures using the trimmed likelihood estimator. Comput. Stat. Data Anal. 52, 299–308 (2007)
Article MathSciNet MATH Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39, 1–38 (1977)
MathSciNet MATH Google Scholar
Lin, T.I., McLachlan, G.J., Lee, S.X.: Extending mixtures of factor models using the restricted multivariate skew-normal distribution. J. Multivar. Anal. 143, 398–413 (2016)
Article MathSciNet MATH Google Scholar
Lee, S.X.: Mining high-dimensional CyTOF data: concurrent gating, outlier removal, and dimension reduction. In: Huang, Z., Xiao, X., Cao, X. (eds.) ADC 2017. LNCS, vol. 10538, pp. 178–189. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68155-9_14
Chapter Google Scholar
Levine, J.H., et al.: Data driven phenotypic dissection of aml reveals progenitor-like cells that correlate with prognosis. Cell 162, 184–197 (2015)
Article Google Scholar
Weber, L.M., Robinson, M.D.: Comparison of clustering methods for high-dimensional single-cell flow and mass cytometry data. Cytom. A 89A, 1084–1096 (2016)
Article Google Scholar
Van Gassen, S., Callebaut, B., Van Helden, M.J., Lambrecht, B.N., Demeester, P., Dhaene, T.: FlowSOM: using self-organizing maps for visualization and interpretation of cytometry data. Cytom. A 87A, 636–645 (2015)
Article Google Scholar
Sorensen, T., Baumgart, S., Durek, P., Grutzkau, A., Haaupl, T.: immunoClust - an automated analysis pipeline for the identification of immunophenotypic signatures in high-dimensional cytometric datasets. Cytom. A 87A, 603–615 (2015)
Article Google Scholar
Mosmann, T.R., Naim, I., Rebhahn, J., Datta, S., Cavenaugh, J.S., Weaver, J.M.: SWIFT - scalable clustering for automated identification of rare cell populations in large, high-dimensional flow cytometry datasets. Cytom. A 85A, 422–433 (2014)
Article Google Scholar
Aghaeepour, N., Nikoloc, R., Hoos, H.H., Brinkman, R.R.: Rapid cell population identification in flow cytometry data. Cytom. A 79, 6–13 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematics and Physics, University of Queensland, Brisbane, Australia
Sharon X. Lee

Authors

Sharon X. Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sharon X. Lee .

Editor information

Editors and Affiliations

Hong Kong University of Science and Technology, Hong Kong, China
Qiang Yang
Nanjing University, Nanjing, China
Zhi-Hua Zhou
University of Macau, Taipa, Macau, China
Zhiguo Gong
Southeast University, Nanjing, China
Min-Ling Zhang
Nanjing University of Aeronautics and Astronautics, Nanjing, China
Sheng-Jun Huang

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 51 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, S.X. (2019). CytoFA: Automated Gating of Mass Cytometry Data via Robust Skew Factor Analzyers. In: Yang, Q., Zhou, ZH., Gong, Z., Zhang, ML., Huang, SJ. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11439. Springer, Cham. https://doi.org/10.1007/978-3-030-16148-4_40

Download citation

DOI: https://doi.org/10.1007/978-3-030-16148-4_40
Published: 22 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16147-7
Online ISBN: 978-3-030-16148-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics