Efficient Model-Based Clustering for LC-MS Data

Łuksza, Marta; Kluge, Bogusław; Ostrowski, Jerzy; Karczmarski, Jakub; Gambin, Anna

doi:10.1007/11851561_4

Marta Łuksza²¹,
Bogusław Kluge²¹,
Jerzy Ostrowski²²,
Jakub Karczmarski²² &
…
Anna Gambin²¹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4175))

Included in the following conference series:

International Workshop on Algorithms in Bioinformatics

Abstract

Proteomic mass spectrometry is gaining an increasing role in diagnostics and in studies on protein complexes and biological systems. The issue of high-throughput data processing is therefore becoming more and more significant. The problems of data imperfectness, presence of noise and of various errors introduced during experiments arise.

In this paper we focus on the peak alignment problem. As an alternative to heuristic based approaches to aligning peaks from different mass spectra we propose a mathematically sound method which exploits the model-based approach. In this framework experiment errors are modeled as deviations from real values and mass spectra are regarded as finite Gaussian mixtures. The advantage of such an approach is that it provides convenient techniques for adjusting parameters and selecting solutions of best quality. The method can be parameterized by assuming various constraints. In this paper we investigate different classes of models and select the most suitable one. We analyze the results in terms of statistically significant biomarkers that can be identified after alignment of spectra.

The research described in this paper was partially supported by Polish Ministry of Education and Science grants KBN-8 T11F 021 28 and PBZ-KBN-088/P04/2003.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis

Article Open access 12 February 2021

Probabilistic and Likelihood-Based Methods for Protein Identification from MS/MS Data

Specter: linear deconvolution for targeted analysis of data-independent acquisition mass spectrometry proteomics

Article 02 April 2018

References

Aebersold, R., Mann, M.: Mass-spectrometry based proteomics. Nature 422, 198–207 (2003)
Article Google Scholar
Tibshirani, R., Hastie, T., Narasimhan, B., Soltys, S., Shi, G., Koong, A., Le, Q.T.: Sample classification from protein mass spectrometry, by peak probability contrasts. Bioinformatics 20, 3034–3044 (2004)
Article Google Scholar
Wu, B., Abbott, T., Fishman, D., McMurray, W., Mor, G., Stone, K., Ward, D., Williams, K., Zhao, H.: Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data. Bioinformatics 19, 1636–1643 (2003)
Article Google Scholar
Wang, W., Zhou, H., Lin, H., Roy, S., Shaler, T.A., Hill, L.R., Norton, S., Kumar, P., Anderle, M., Becker, C.H.: Quantification of proteins and metabolites by mass spectrometry without isotopic labeling or spiked standards. Analytical Chemistry 75, 4818–4826 (2003)
Article Google Scholar
Wong, J.W.H., Cagney, G., Cartwright, H.M.: SpecAlign. processing and alignment of mass spectra datasets. Bioinformatics 21, 2088–2090 (2005)
Article Google Scholar
Prakash, A., Mallick, P., Whiteaker, J., Zhang, H., Paulovich, A., Flory, M., Lee, H., Aebersold, R., Schwikowski, B.: Signal maps for mass spectrometry-based comparative proteomics. Molecular and Cellular Proteomics 5, 423–432 (2006)
Article Google Scholar
Smith, C.A., Want, E.J., O’Maille, G., Abagyan, R., Siuzdak, G.: XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Analytical Chemistry 78, 779–787 (2006)
Article Google Scholar
Fraley, C., Raftery, A.E.: How many clusters? which clustering method? answers via model-based cluster analysis. The Computer Journal 41, 578–588 (1998)
Article MATH Google Scholar
Yeung, K.Y., Fraley, C., Murua, A., Raftery, A.E., Ruzzo, W.L.: Model-based clustering and data transformations for gene expression data. Bioinformatics 17, 977–987 (2001)
Article Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J., Fayyad, U. (eds.) Second International Conference on Knowledge Discovery and Data Mining, pp. 226–231. AAAI Press, Menlo Park (1996)
Google Scholar
Gambin, A., Dutkowski, J., Karczmarski, J., Kluge, B., Kowalczyk, K., Ostrowski, J., Poznański, J., Tiuryn, J., Bakun, M., Dadlez, M.: Automated reduction and interpretation of multidimensional ms data for analysis of complex peptide mixtures. International Journal of Mass Spectrometry (in press, 2006)
Google Scholar
Dempster, A.P., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of Royal Statististical Society, Series B, 1–38 (1977)
Google Scholar
Petersen, K.B.: On the slow convergence of EM and VBEM in low-noise linear models. Neural Computation 17, 1921–1926 (2005)
Article MATH MathSciNet Google Scholar
Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)
Article MATH MathSciNet Google Scholar
Fraley, C., Raftery, A.E.: MCLUST: Software for model-based clustering, density estimation and discriminant. Technical Report 415R, University of Washington, Department of Statistics (2002)
Google Scholar
Haughton, D.M.A.: On the choice of a model to fit data from an exponential family. The Annals of Statistics 16, 342–355 (1988)
Article MATH MathSciNet Google Scholar
Schwarz, G.: Estimating the dimension of a model. The Annals of Statistics 6, 461–464 (1978)
Article MATH MathSciNet Google Scholar
Breiman, L.: Random forests. Machine learning 45, 5–32 (2001)
Article MATH Google Scholar
Storey, J., Tibshirani, R.: Statistical significance for genomewide studies. PNAS 100, 9440–9445 (2003)
Article MATH MathSciNet Google Scholar
Delaglio, F., Grzesiek, S., Vuister, G.W., Zhu, G., Pfeifer, J., Bax, A.: Nmrpipe: a multidimensional spectral processing system based on unix pipes. J. Biomol. NMR 6, 277–293 (1995)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Informatics, Warsaw University, Banacha 2, 02-097, Warsaw, Poland
Marta Łuksza, Bogusław Kluge & Anna Gambin
Department of Gastroenterology, Medical Center for Postgraduate Education and Maria Sklodowska-Curie Memorial Cancer Center and Institute of Oncology, Roentgena 5, 02-781, Warsaw, Poland
Jerzy Ostrowski & Jakub Karczmarski

Authors

Marta Łuksza
View author publications
You can also search for this author in PubMed Google Scholar
Bogusław Kluge
View author publications
You can also search for this author in PubMed Google Scholar
Jerzy Ostrowski
View author publications
You can also search for this author in PubMed Google Scholar
Jakub Karczmarski
View author publications
You can also search for this author in PubMed Google Scholar
Anna Gambin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Ecole Polytechnique Fédérale de Lausanne, Switzerland
Philipp Bücher
Laboratory for Computational Biology and Bioinformatics, EPFL (Ecole Polytechnique Fédérale de Lausanne), Swiss Institute of Bioinformatics, Lausanne, Switzerland
Bernard M. E. Moret

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Łuksza, M., Kluge, B., Ostrowski, J., Karczmarski, J., Gambin, A. (2006). Efficient Model-Based Clustering for LC-MS Data. In: Bücher, P., Moret, B.M.E. (eds) Algorithms in Bioinformatics. WABI 2006. Lecture Notes in Computer Science(), vol 4175. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11851561_4

Download citation

DOI: https://doi.org/10.1007/11851561_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-39583-6
Online ISBN: 978-3-540-39584-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Efficient Model-Based Clustering for LC-MS Data

Abstract

Access this chapter

Preview

Similar content being viewed by others

CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis

Probabilistic and Likelihood-Based Methods for Protein Identification from MS/MS Data

Specter: linear deconvolution for targeted analysis of data-independent acquisition mass spectrometry proteomics

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Efficient Model-Based Clustering for LC-MS Data

Abstract

Access this chapter

Preview

Similar content being viewed by others

CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis

Probabilistic and Likelihood-Based Methods for Protein Identification from MS/MS Data

Specter: linear deconvolution for targeted analysis of data-independent acquisition mass spectrometry proteomics

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation