Abstract
Nuclear Magnetic Resonance spectroscopy is a powerful technique for the determination of protein structures and has been supported by computers for decades. One important step during this process is the identification of resonances in the data. However, due to noise, overlapping effects and artifacts occuring during the measurements, many algorithms fail to identify resonances correctly. In this paper, we present a novel interpretation of the data as a sample drawn from a mixture of bivariate Gaussian distributions. Therefore, the identification of resonances can be reduced to a Gaussian mixture decomposition problem which is solved with the help of the Expectation-Maximization algorithm. A program in the Java programming language that exploits an implementation of this algorithm is described and tested on experimental data. Our results indicate that this approach offers valuable information such as an objective measure on the likelihood of the identified resonances.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Wüthrich, K.: NMR of Proteins and Nucleic Acids. John Wiley, New York (1986)
Williamson, M.P., Craven, C.J.: Automated protein structure calculation from NMR data. J. Biomol. NMR 43, 131–143 (2009)
Koradi, R., Billeter, M., Engeli, M., Güntert, P., Wüthrich, K.: Automated Peak Picking and Peak Integration in Macromolecular NMR Spectra Using AUTOPSY. J. Magn. Reson. 135, 288–297 (1998)
Alipanahi, B., Gao, X., Karakoc, E., Donaldson, L., Li, M.: PICKY: a novel SVD-based NMR spectra peak picking method. Bioinformatics 25, i268–i275 (2009)
Carrara, E.A., Pagliari, F., Nicolini, C.: Neural Networks for the Peak-Picking of Nuclear Magnetic Resonance Spectra. Neural Networks 6, 1023–1032 (1993)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. J. Roy. Stat. Soc. B. Met. 39, 1–38 (1977)
McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions. John Wiley & Sons, Chichester (1997)
Fraley, C., Raftery, A.E.: MCLUST Version 3 for R: Normal Mixture Modeling and Model-based Clustering. Technical report, University of Washington (2009)
R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2010) ISBN 3-900051-07-0
Hautaniemi, S., Edgren, H., Vesanen, P., Wolf, M., Järvinen, A.-K., Yli-Harja, O., Astola, J., Kallioniemi, O., Monni, O.: A novel strategy for microarray quality control using Bayesian networks. Bioinformatics 19, 2031–2038 (2003)
Banfield, J.D., Raftery, A.E.: Model-Based Gaussian and Non-Gaussian Clustering. Biometrics 49, 803–821 (1993)
Wang, H.X., Luo, B., Zhang, Q.B., Wei, S.: Estimation for the number of components in a mixture model using stepwise split-and-merge EM algorithm. Pattern Recogn. Lett. 25, 1799–1809 (2004)
Pernkopf, F., Bouchaffra, D.: Genetic-Based EM Algorithm for Learning Gaussian Mixture Models. IEEE T. Pattern Anal. 27, 1344–1348 (2005)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Simoudis, E., Han, J., Fayyad, U.M. (eds.) Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD 1996), pp. 226–231. AAAI Press, Menlo Park (1996)
Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97, 611–631 (2002)
McLachlan, G.J., Basford, K.E.: Mixture models: Inference and applications to clustering. Dekker, New York (1988)
Bartels, C., Xia, T.-H., Billeter, M., Güntert, P., Wüthrich, K.: The program XEASY for computer-supported NMR spectral analysis of biological macromolecules. J. Biomol. NMR 6, 1–10 (1995)
Urbanek, S.: Rserve – A Fast Way to Provide R Functionality to Applications. In: Hornik, K., Leisch, F., Zeileis, A. (eds.) Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003), Vienna, Austria (2003)
Graham, R.L.: An efficient algorithm for determining the convex hull of a planar set. Inform. Process. Lett. 1, 132–133 (1972)
Schwarz, G.: Estimating the Dimension of a Model. Ann. Stat. 6, 461–464 (1978)
Wasmer, C., Zimmer, A., Sabaté, R., Soragni, A., Saupe, S.J., Ritter, C., Meier, B.H.: Structural similarity between the prion domain of HET-s and a homologue can explain amyloid cross-seeding in spite of limited sequence identity. J. Mol. Biol. 402, 311–325 (2010)
Güntert, P., Dötsch, V., Wider, G., Wüthrich, K.: Processing of multi-dimensional NMR data with the new software PROSA. J. Biomol. NMR 2, 619–629 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Krone, M., Klawonn, F., Lührs, T., Ritter, C. (2011). Identification of Nuclear Magnetic Resonance Signals via Gaussian Mixture Decomposition. In: Gama, J., Bradley, E., Hollmén, J. (eds) Advances in Intelligent Data Analysis X. IDA 2011. Lecture Notes in Computer Science, vol 7014. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24800-9_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-24800-9_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24799-6
Online ISBN: 978-3-642-24800-9
eBook Packages: Computer ScienceComputer Science (R0)