Skip to main content

Efficient Model-Based Clustering for LC-MS Data

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4175))

Abstract

Proteomic mass spectrometry is gaining an increasing role in diagnostics and in studies on protein complexes and biological systems. The issue of high-throughput data processing is therefore becoming more and more significant. The problems of data imperfectness, presence of noise and of various errors introduced during experiments arise.

In this paper we focus on the peak alignment problem. As an alternative to heuristic based approaches to aligning peaks from different mass spectra we propose a mathematically sound method which exploits the model-based approach. In this framework experiment errors are modeled as deviations from real values and mass spectra are regarded as finite Gaussian mixtures. The advantage of such an approach is that it provides convenient techniques for adjusting parameters and selecting solutions of best quality. The method can be parameterized by assuming various constraints. In this paper we investigate different classes of models and select the most suitable one. We analyze the results in terms of statistically significant biomarkers that can be identified after alignment of spectra.

The research described in this paper was partially supported by Polish Ministry of Education and Science grants KBN-8 T11F 021 28 and PBZ-KBN-088/P04/2003.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aebersold, R., Mann, M.: Mass-spectrometry based proteomics. Nature 422, 198–207 (2003)

    Article  Google Scholar 

  2. Tibshirani, R., Hastie, T., Narasimhan, B., Soltys, S., Shi, G., Koong, A., Le, Q.T.: Sample classification from protein mass spectrometry, by peak probability contrasts. Bioinformatics 20, 3034–3044 (2004)

    Article  Google Scholar 

  3. Wu, B., Abbott, T., Fishman, D., McMurray, W., Mor, G., Stone, K., Ward, D., Williams, K., Zhao, H.: Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data. Bioinformatics 19, 1636–1643 (2003)

    Article  Google Scholar 

  4. Wang, W., Zhou, H., Lin, H., Roy, S., Shaler, T.A., Hill, L.R., Norton, S., Kumar, P., Anderle, M., Becker, C.H.: Quantification of proteins and metabolites by mass spectrometry without isotopic labeling or spiked standards. Analytical Chemistry 75, 4818–4826 (2003)

    Article  Google Scholar 

  5. Wong, J.W.H., Cagney, G., Cartwright, H.M.: SpecAlign. processing and alignment of mass spectra datasets. Bioinformatics 21, 2088–2090 (2005)

    Article  Google Scholar 

  6. Prakash, A., Mallick, P., Whiteaker, J., Zhang, H., Paulovich, A., Flory, M., Lee, H., Aebersold, R., Schwikowski, B.: Signal maps for mass spectrometry-based comparative proteomics. Molecular and Cellular Proteomics 5, 423–432 (2006)

    Article  Google Scholar 

  7. Smith, C.A., Want, E.J., O’Maille, G., Abagyan, R., Siuzdak, G.: XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Analytical Chemistry 78, 779–787 (2006)

    Article  Google Scholar 

  8. Fraley, C., Raftery, A.E.: How many clusters? which clustering method? answers via model-based cluster analysis. The Computer Journal 41, 578–588 (1998)

    Article  MATH  Google Scholar 

  9. Yeung, K.Y., Fraley, C., Murua, A., Raftery, A.E., Ruzzo, W.L.: Model-based clustering and data transformations for gene expression data. Bioinformatics 17, 977–987 (2001)

    Article  Google Scholar 

  10. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J., Fayyad, U. (eds.) Second International Conference on Knowledge Discovery and Data Mining, pp. 226–231. AAAI Press, Menlo Park (1996)

    Google Scholar 

  11. Gambin, A., Dutkowski, J., Karczmarski, J., Kluge, B., Kowalczyk, K., Ostrowski, J., Poznański, J., Tiuryn, J., Bakun, M., Dadlez, M.: Automated reduction and interpretation of multidimensional ms data for analysis of complex peptide mixtures. International Journal of Mass Spectrometry (in press, 2006)

    Google Scholar 

  12. Dempster, A.P., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of Royal Statististical Society, Series B, 1–38 (1977)

    Google Scholar 

  13. Petersen, K.B.: On the slow convergence of EM and VBEM in low-noise linear models. Neural Computation 17, 1921–1926 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  14. Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  15. Fraley, C., Raftery, A.E.: MCLUST: Software for model-based clustering, density estimation and discriminant. Technical Report 415R, University of Washington, Department of Statistics (2002)

    Google Scholar 

  16. Haughton, D.M.A.: On the choice of a model to fit data from an exponential family. The Annals of Statistics 16, 342–355 (1988)

    Article  MATH  MathSciNet  Google Scholar 

  17. Schwarz, G.: Estimating the dimension of a model. The Annals of Statistics 6, 461–464 (1978)

    Article  MATH  MathSciNet  Google Scholar 

  18. Breiman, L.: Random forests. Machine learning 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  19. Storey, J., Tibshirani, R.: Statistical significance for genomewide studies. PNAS 100, 9440–9445 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  20. Delaglio, F., Grzesiek, S., Vuister, G.W., Zhu, G., Pfeifer, J., Bax, A.: Nmrpipe: a multidimensional spectral processing system based on unix pipes. J. Biomol. NMR 6, 277–293 (1995)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Łuksza, M., Kluge, B., Ostrowski, J., Karczmarski, J., Gambin, A. (2006). Efficient Model-Based Clustering for LC-MS Data. In: Bücher, P., Moret, B.M.E. (eds) Algorithms in Bioinformatics. WABI 2006. Lecture Notes in Computer Science(), vol 4175. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11851561_4

Download citation

  • DOI: https://doi.org/10.1007/11851561_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-39583-6

  • Online ISBN: 978-3-540-39584-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics