Abstract
Pairwise structure alignment commonly uses root mean square deviation (RMSD) to measure the structural similarity, and methods for optimizing RMSD are well established. However, multiple structure alignment with gaps cannot use these methods directly. We extend RMSD to weighted RMSD for multiple structures, which includes gapped alignment as a special case. By using multiplicative weights, we show that weighted RMSD for all pairs is the same as weighted RMSD to an ave-rage of the structures. Although we show that the two tasks of finding the optimal translations and rotations for minimizing weighted RMSD cannot be separated for multiple structures like they can for pairs, an inherent difficulty and a fact ignored by previous work, we develop an iterative algorithm, in which each iteration takes linear time and the number of iterations is small, to converge weighted RMSD to a local minimum. 10,000 experiments done on each of 23 protein families from HOMSTRAD (where each structure starts with a random translation and rotation) converge rapidly to the same minimum. Finally we propose a heuristic method to iteratively remove the effect of outliers and find well-aligned positions that determine the structural conserved region by modeling B-factors and deviations from the average positions as weights and iteratively assigning higher weights to better aligned atoms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Altman, R.B., Gerstein, M.: Finding an Average Core Structure: Application to the Globins. In: Proc. 2nd Int. Conf. Intell. Syst. Mol. Biol., pp. 19–27 (1994)
Branden, C., Tooze, J.: Introduction to Protein Structure, 2nd edn. Garland Publishing, New York (1999)
Chew, L.P., Kedem, K.: Finding the Consensus Shape for a Protein Family. Algorithmica 38(1), 115–129 (2003)
Dror, O., Benyamini, H., Nussinov, R., Wolfson, H.J.: Multiple Structural Alignment by Secondary Structures: Algorithm and Applications. Protein Science 12(11), 2492–2507 (2003)
Ebert, J., Brutlag, D.: Development and Validation of a Consistency Based Multiple Structure Alignment Algorithm. Bioinformatics 22(9), 1080–1087 (2006)
Gerstein, M., Levitt, M.: Comprehensive Assessment of Automatic Structural Alignment Against a Manual Standard, the SCOP Classification of Proteins. Protein Science 7(2), 445–456 (1998)
Guda, C., Scheeff, E.D., Bourne, P.E., Shindyalov, I.N.: A New Algorithm for the Alignment of Multiple Protein Structures Using Monte Carlo Optimization. In: Proceedings of Pacific Symposium on Biocomputing, pp. 275–286 (2001)
Horn, B.K.P.: Closed-form solution of Absolute Orientation Using Unit Quaternions. Journal of the Optical Society of America A 4(4), 629–642 (1987)
Konagurthu, A.S., Whisstock, J.C., Stuckey, P.J., Lesk, A.M.: MUSTANG: A Multiple Structural Alignment Algorithm. Proteins 64(3), 559–574 (2006)
Leibowitz, N., Nussinov, R., Wolfson, H.J.: MUSTA — A General, Efficient, Automated Method for Multiple Structure Alignment and Detection of Common Motifs: Application to Proteins. Journal of Computational Biology 8(2), 93–121 (2001)
Lupyan, D., Leo-Macias, A., Ortiz, A.R.: A New Progressive-iterative Algorithm for Multiple Structure Alignment. Bioinformatics 21(15), 3255–3263 (2005)
Mizuguchi, K., Deane, C.M., Blundell, T.L., Overington, J.P.: HOMSTRAD: A Database of Protein Structure Alignments for Homologous Families. Protein Science 7, 2469–2471 (1998)
Ochagavia, M.E., Wodak, S.: Progressive Combinatorial Algorithm for Multiple Structural Alignments: Application to Distantly Related Proteins. Proteins 55(2), 436–454 (2004)
Pennec, X.: Multiple Registration and Mean Rigid Shapes: Application to the 3D Case. In: Proceedings of the 16th Leeds Annual Statistical Workship, pp. 178–185 (1996)
Russell, R.B., Barton, G.J.: Multiple Protein Sequence Alignment from Tertiary Structure Comparison: Assignment of Global and Residue Confidence Levels. Proteins 14(2), 309–323 (1992)
Shatsky, M., Nussinov, R., Wolfson, H.J.: A Method for Simultaneous Alignment of Multiple Protein Structures. Proteins 56(1), 143–156 (2004)
Sutcliffe, M.J., Haneef, I., Carney, D., Blundell, T.L.: Knowledge Based Modelling of Homologous Proteins, Part I: Three-dimensional Frameworks Derived from the Simultaneous Superposition of Multiple Structures. Protein Engineering 1(5), 377–384 (1987)
Taylor, W.R., Flores, T.P., Orengo, C.A.: Multiple Protein Structure Alignment. Protein Science 3(10), 1858–1870 (1994)
Verboon, P., Gabriel, K.R.: Generalized Procrustes Analysis with Iterative Weighting to Achieve Resistance. Br. J. Math. Stat. Psychol. 48(1), 57–73 (1995)
Wang, X., Snoeyink, J.S.: Multiple Structure Alignment by Optimal RMSD Implies that the Average Structure is a Consensus. In: Proceedings of 2006 LSS Computational Systems Bioinformatics Conference, pp. 79–87 (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, X., Snoeyink, J. (2007). Defining and Computing Optimum RMSD for Gapped Multiple Structure Alignment. In: Giancarlo, R., Hannenhalli, S. (eds) Algorithms in Bioinformatics. WABI 2007. Lecture Notes in Computer Science(), vol 4645. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74126-8_19
Download citation
DOI: https://doi.org/10.1007/978-3-540-74126-8_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74125-1
Online ISBN: 978-3-540-74126-8
eBook Packages: Computer ScienceComputer Science (R0)