Abstract
A fundamental problem in molecular biology is the comparison of 3-dimensional protein folds in order to develop similarity measures and exploit them for protein clustering, database searches, and drug design. Contact map overlap (CMO) is one of the most reliable and robust measures of protein structure similarity. Fold comparison can be done by aligning the amino acid residues of two proteins in a way that maximizes the number of common residue contacts. CMO maximization is gaining increasing attention because it results in protein clusterings in good agreement with classification by experts. However, CMO maximization is an \({\mathcal{NP}}\)-hard problem and few exact algorithms exist for solving this problem to global optimality.
In this paper, we propose a branch-and-reduce exact algorithm for the CMO problem. Contrary to previous approaches, we do not transform CMO to other combinatorial optimization problems for solution. Instead, we address the problem directly in its natural form. By exploiting the problem’s mathematical structure, we develop bounding and reduction procedures that lead to a very efficient algorithm. We present extensive computational results for over 36000 test problems from the literature. These results demonstrate that our algorithm is significantly faster and solves many more challenging test sets than the best previous algorithms for CMO. Furthermore, the algorithm results in protein clusters that are in excellent agreement with the SCOP database.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
NIH: Protein structural initiative: Better tools and better knowledge for structural genomics (Web), http://nigms.nih.gov/psi/
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Research 28, 235–242 (2000)
Hulo, N., Sigrist, C.J.A., Saux, V.L., Langendijk-Genevaux, P.S., Bordoli, L., Gattiker, A., De Castro, E., Bucher, P., Bairoch, A.: Recent improvements to the PROSITE database. Nucleic Acids Research 32, 134–137 (2004)
Pearson, W.R., Sierk, M.L.: The limits of protein sequence comparison?. Current opinion in structural biology 15, 254–260 (2005)
Vogt, G., Etzold, T., Argos, P.: An assessment of amino acid exchange matrices in aligning protein sequences: The twilight zone revisited. Journal of Molecular Biology 249, 816–831 (1995)
Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., Thornton, J.M.: CATH-A hierarchic classification of protein domain structures. Structure 5, 1093–1108 (1997)
Godzik, A.: The structural alignment between two proteins: Is there a unique answer?. Protein science 5, 1325–1338 (1996)
Godzik, A., Skolnick, J., Kolinski, A.: A topology fingerprint approach to inverse protein folding problem. Journal of Molecular Biology 227, 227–238 (1992)
Godzik, A., Skolnick, J.: Flexible algorithm for direct multiple alignment of protein structures and sequences. Computer applications in biosciences: CABIOS 10, 587–596 (1994)
Zaki, M.J., Jin, S., Bystroff, C.: Mining residue contacts in proteins using local structure predictions. In: Proceedings. IEEE Symposium on Bioinformatics and Biomedical Engineering, pp. 168–175. IEEE Computer Society, Los Alamitos (2000)
Zhao, Y., Karypis, G.: Prediction of contact maps using support vector machines. In: Proceedings Third IEEE International Symposium on Bioinformatics and Bioengineering, pp. 26–36. IEEE Computer Society, Los Alamitos (2003)
Caprara, A., Carr, R., Istrail, S., Lancia, G., Walenz, B.: 1001 optimal PDB structure alignments: Integer programming methods for finding the maximum contact map overlap. Journal of Computational Biology 11, 27–52 (2004)
Goldman, D.: Algorithmic aspects of protein folding and protein structure similarity. PhD thesis, University of California at Berkeley (2000)
Carr, R.D., Lancia, G., Istrail, S.: Branch-and-cut algorithms for independent set problems: Integrality gap and an application to protein structural alignment. Technical report, Sandia National laboratories (2000)
Lancia, G., Carr, R., Walenz, B., Istrail, S.: 101 optimal PDB structure alignments: A branch-and-cut algorithm for the maximum contact map overlap problem. In: Proceedings of Annual International Conference on Computational Biology (RECOMB), pp. 193–202 (2001)
Caprara, A., Lancia, G.: Structural alignment of large-size proteins via Lagrangian relaxation. In: Proceeding of Internation Conference on Computational Biology (RECOMB), pp. 100–108 (2002)
Strickland, D.M., Barnes, E., Sokol, J.S.: Optimal protein structure alignment using maximum cliques. Operations Research 53, 389–402 (2005)
Xie, W., Sahinidis, N.V.: A reduction-based exact algorithm for the contact map overlap problem (in preparation, 2005)
Dongarra, J.J.: Performance of various computers using standard linear equations software. Technical report, University of Tennessee, Knoxville, TN (2005), http://www.netlib.org/benchmark/performance.ps
Kohlbacher, O., Lenhof, H.: BALL—Rapid software prototyping in computational molecular biology. Bioinformatics 16, 815–824 (2000)
Murzin, A., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: A structural classification of protein database for the investigation of sequences and structures. Journal of Molecular Biology 247, 536–540 (1995)
Andreeva, A., Howorth, D., Brenner, S.E., Hubbard, T.J.P., Chothia, C., Murzin, A.G.: SCOP database in 2004: Refinements integrate structure and sequence family data. Nucleic Acids Research 32, D226–D229 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xie, W., Sahinidis, N.V. (2006). A Branch-and-Reduce Algorithm for the Contact Map Overlap Problem. In: Apostolico, A., Guerra, C., Istrail, S., Pevzner, P.A., Waterman, M. (eds) Research in Computational Molecular Biology. RECOMB 2006. Lecture Notes in Computer Science(), vol 3909. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11732990_43
Download citation
DOI: https://doi.org/10.1007/11732990_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33295-4
Online ISBN: 978-3-540-33296-1
eBook Packages: Computer ScienceComputer Science (R0)