Abstract
We present a method for simultaneous dimension reduction and metastability analysis of high dimensional time series. The approach is based on the combination of hidden Markov models (HMMs) and principal component analysis. We derive optimal estimators for the log-likelihood functional and employ the Expectation Maximization algorithm for its numerical optimization. We demonstrate the performance of the method on a generic 102-dimensional example, apply the new HMM-PCA algorithm to a molecular dynamics simulation of 12–alanine in water and interpret the results.
Supported in part by the DFG Research Center MATHEON, Berlin, and Microsoft Research Ltd., Cambridge, UK (Contract No. 2005-042).
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ichiye, T., Karplus, M.: Collective motions in proteins – a covariance analysis of atomic fluctuations in molecular dynamics and normal mode simulations. Proteins 11, 205–217 (1991)
Frenkel, D., Smit, B.: Understanding Molecular Dynamics: From Algorithms to Applications. Academic Press, London (2002)
Weinan, E., Vanden-Eijnden, E.: Metastability, conformation dynamics, and transition pathways in complex systems. In: Attinger, S., Koumoutsakos, P. (eds.) Multiscale, Modelling, and Simulation, pp. 35–68. Springer, Berlin (2004)
Deuflhard, P., Schütte, C.: Molecular conformation dynamics and computational drug design. In: Applied Mathematics Entering the 21st Century: Invited Talks from the ICIAM 2003 Congress (2004)
Holmes, P., Lumley, J., Berkooz, G.: Turbulence, Coherent Structures, Dynamical Systems and Symmetry. Cambridge University Press, Cambridge (1996)
Givon, D., Kupferman, R., Stuart, A.: Extracting macroscopic dynamics: Model problems and algorithms. Nonlinearity 17, R55–R127 (2004)
Kupferman, R., Stuart, A.: Fitting sde models to nonlinear kac-zwanzig heat bath models. Physica D 199, 279–316 (2004)
Balsera, M., Wriggers, W., Oono, Y., Schulten, K.: Pricipal Component Analysis and long time protein dynamics. J. Chem. Phys. 100, 2567–2572 (1996)
Hyvarinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. John Wiley & Sons, Chichester (2001)
Meyer, T., Ferrer-Costa, C., Perez, A., Rueda, M., Bidon-Chanal, A., Luque, F., Laughton, C., Orozco, M.: Essential dynamics: a tool for efficient trajectory compression and management. JCTC 2, 251–258 (2006)
Hünenberger, P., Mark, A., van Gunsteren, W.: Fluctuation and cross-correlation analysis of protein motions observed in nanosecond molecular dynamics simulations. J. Mol. Biol. 252, 492–503 (1995)
Monahan, A.: Nonlinear principal component analysis by neural networks: Theory and application to the lorenz system. J. Climate 13, 821–835 (2000)
Christiansen, B.: The shortcomings of NLPCA in identifying circulation regimes. J. Climate 18, 4814–4823 (2005)
Aggarwal, C., Wolf, J., Yu, P., Procopiuc, C., Park, J.: Fast algorithms for projected clustering. In: Proceedings of the 1999 ACM SIGMOD international conference on Management of data (1999)
Chakrabarti, K., Mehrotra, S.: Local dimensionality reduction: A new approach to indexing high dimensional spaces. In: Proceedings of the 26th VLDB Conference, Cairo, Egypt, pp. 98–115 (2000)
Zhang, P., Huang, Y., Shekhar, S., Kumar, V.: Correlation analysis of spatial time series datasets: A filter-and-refine approach. In: Whang, K.-Y., Jeon, J., Shim, K., Srivastava, J. (eds.) PAKDD 2003. LNCS (LNAI), vol. 2637, Springer, Heidelberg (2003)
Baum, L., Petrie, T., Soules, G., Weiss, N.: A maximization technique occuring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Stat. 41, 164–171 (1970)
Baum, L.: An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. Inequalities 3, 1–8 (1972)
Bilmes, J.: A Gentle Tutorial of the EM Algorithm and its Applications to Parameter Estimation for Gaussian Mixture and Hidden Markov Models. Thechnical Report. International Computer Science Institute, Berkeley (1998)
Ghahramani, Z.: An introduction to hidden Markov models and Bayesian networks. Int. J. Pattern Recognition and Artificial Intelligence 15, 9–42 (2001)
Frydman, J., Lakner, P.: Maximum likelihood estimation of hidden Markov processes. Ann. Appl. Prob. 13, 1296–1312 (2003)
Horenko, I., Dittmer, E., Fischer, A., Schütte, C.: Automated model reduction for complex systems exhibiting metastability. In: SIAM Multiscale Modeling and Simulation (accepted for publication, 2005)
Golub, G., van Loan, C.: Matrix computations, 2nd edn. The John Hopkins University Press, Baltimore (1989)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. B 39, 1–38 (1977)
Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Informat. Theory 13, 260–269 (1967)
Schmidt-Ehrenberg, J., Baum, D., Hege, H.C.: Visualizing dynamic molecular conformations. In: Proceedings of IEEE Visualization 2002, pp. 235–242 (2002)
Schütte, C., Fischer, A., Huisinga, W., Deuflhard, P.: A direct approach to conformational dynamics based on hybrid Monte Carlo. J. Comput. Phys. 151, 146–168 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Horenko, I., Schmidt-Ehrenberg, J., Schütte, C. (2006). Set-Oriented Dimension Reduction: Localizing Principal Component Analysis Via Hidden Markov Models. In: R. Berthold, M., Glen, R.C., Fischer, I. (eds) Computational Life Sciences II. CompLife 2006. Lecture Notes in Computer Science(), vol 4216. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875741_8
Download citation
DOI: https://doi.org/10.1007/11875741_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45767-1
Online ISBN: 978-3-540-45768-8
eBook Packages: Computer ScienceComputer Science (R0)