Abstract
Proteomic mass spectrometry is gaining an increasing role in diagnostics and in studies on protein complexes and biological systems. The issue of high-throughput data processing is therefore becoming more and more significant. The problems of data imperfectness, presence of noise and of various errors introduced during experiments arise.
In this paper we focus on the peak alignment problem. As an alternative to heuristic based approaches to aligning peaks from different mass spectra we propose a mathematically sound method which exploits the model-based approach. In this framework experiment errors are modeled as deviations from real values and mass spectra are regarded as finite Gaussian mixtures. The advantage of such an approach is that it provides convenient techniques for adjusting parameters and selecting solutions of best quality. The method can be parameterized by assuming various constraints. In this paper we investigate different classes of models and select the most suitable one. We analyze the results in terms of statistically significant biomarkers that can be identified after alignment of spectra.
The research described in this paper was partially supported by Polish Ministry of Education and Science grants KBN-8 T11F 021 28 and PBZ-KBN-088/P04/2003.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aebersold, R., Mann, M.: Mass-spectrometry based proteomics. Nature 422, 198–207 (2003)
Tibshirani, R., Hastie, T., Narasimhan, B., Soltys, S., Shi, G., Koong, A., Le, Q.T.: Sample classification from protein mass spectrometry, by peak probability contrasts. Bioinformatics 20, 3034–3044 (2004)
Wu, B., Abbott, T., Fishman, D., McMurray, W., Mor, G., Stone, K., Ward, D., Williams, K., Zhao, H.: Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data. Bioinformatics 19, 1636–1643 (2003)
Wang, W., Zhou, H., Lin, H., Roy, S., Shaler, T.A., Hill, L.R., Norton, S., Kumar, P., Anderle, M., Becker, C.H.: Quantification of proteins and metabolites by mass spectrometry without isotopic labeling or spiked standards. Analytical Chemistry 75, 4818–4826 (2003)
Wong, J.W.H., Cagney, G., Cartwright, H.M.: SpecAlign. processing and alignment of mass spectra datasets. Bioinformatics 21, 2088–2090 (2005)
Prakash, A., Mallick, P., Whiteaker, J., Zhang, H., Paulovich, A., Flory, M., Lee, H., Aebersold, R., Schwikowski, B.: Signal maps for mass spectrometry-based comparative proteomics. Molecular and Cellular Proteomics 5, 423–432 (2006)
Smith, C.A., Want, E.J., O’Maille, G., Abagyan, R., Siuzdak, G.: XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Analytical Chemistry 78, 779–787 (2006)
Fraley, C., Raftery, A.E.: How many clusters? which clustering method? answers via model-based cluster analysis. The Computer Journal 41, 578–588 (1998)
Yeung, K.Y., Fraley, C., Murua, A., Raftery, A.E., Ruzzo, W.L.: Model-based clustering and data transformations for gene expression data. Bioinformatics 17, 977–987 (2001)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J., Fayyad, U. (eds.) Second International Conference on Knowledge Discovery and Data Mining, pp. 226–231. AAAI Press, Menlo Park (1996)
Gambin, A., Dutkowski, J., Karczmarski, J., Kluge, B., Kowalczyk, K., Ostrowski, J., Poznański, J., Tiuryn, J., Bakun, M., Dadlez, M.: Automated reduction and interpretation of multidimensional ms data for analysis of complex peptide mixtures. International Journal of Mass Spectrometry (in press, 2006)
Dempster, A.P., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of Royal Statististical Society, Series B, 1–38 (1977)
Petersen, K.B.: On the slow convergence of EM and VBEM in low-noise linear models. Neural Computation 17, 1921–1926 (2005)
Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)
Fraley, C., Raftery, A.E.: MCLUST: Software for model-based clustering, density estimation and discriminant. Technical Report 415R, University of Washington, Department of Statistics (2002)
Haughton, D.M.A.: On the choice of a model to fit data from an exponential family. The Annals of Statistics 16, 342–355 (1988)
Schwarz, G.: Estimating the dimension of a model. The Annals of Statistics 6, 461–464 (1978)
Breiman, L.: Random forests. Machine learning 45, 5–32 (2001)
Storey, J., Tibshirani, R.: Statistical significance for genomewide studies. PNAS 100, 9440–9445 (2003)
Delaglio, F., Grzesiek, S., Vuister, G.W., Zhu, G., Pfeifer, J., Bax, A.: Nmrpipe: a multidimensional spectral processing system based on unix pipes. J. Biomol. NMR 6, 277–293 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Łuksza, M., Kluge, B., Ostrowski, J., Karczmarski, J., Gambin, A. (2006). Efficient Model-Based Clustering for LC-MS Data. In: Bücher, P., Moret, B.M.E. (eds) Algorithms in Bioinformatics. WABI 2006. Lecture Notes in Computer Science(), vol 4175. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11851561_4
Download citation
DOI: https://doi.org/10.1007/11851561_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-39583-6
Online ISBN: 978-3-540-39584-3
eBook Packages: Computer ScienceComputer Science (R0)