An Optimal Mesh Algorithm for Remote Protein Homology Detection

Abdullah, Firdaus M.; Othman, Razib M.; Kasim, Shahreen; Hashim, Rathiah

doi:10.1007/978-3-642-20998-7_57

Firdaus M. Abdullah³,
Razib M. Othman³,
Shahreen Kasim⁴ &
…
Rathiah Hashim⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 151))

Included in the following conference series:

International Conference on Ubiquitous Computing and Multimedia Applications

2498 Accesses

Abstract

Remote protein homology detection is a problem of detecting evolutionary relationship between proteins at low sequence similarity level. Among several problems in remote protein homology detection include the questions of determining which combination of multiple alignment and classification techniques is the best as well as the misalignment of protein sequences during the alignment process. Therefore, this paper deals with remote protein homology detection via assessing the impact of using structural information on protein multiple alignments over sequence information. This paper further presents the best combinations of multiple alignment and classification programs to be chosen. This paper also improves the quality of the multiple alignments via integration of a refinement algorithm. The framework of this paperbegan with datasets preparation on datasets from SCOP version 1.73, followed by multiple alignments of the protein sequences using CLUSTALW, MAFFT, ProbCons and T-Coffee for sequence-based multiple alignments and 3DCoffee, MAMMOTH-mult, MUSTANG and PROMALS3D for structural-based multiple alignments. Next, a refinement algorithm was applied on the protein sequences to reduce misalignments. Lastly, the aligned protein sequences were classified using the pHMMs generative classifier such as HMMER and SAM and also SVMs discriminative classifier such as SVM-Fold and SVM-Struct. The performances of assessed programs were evaluated using ROC, Precision and Recall tests. The result from this paper shows that the combination of refined SVM-Struct and PROMALS3D performs the best against other programs, which suggests that this combination is the best for RPHD. This paper also shows that the use of the refinement algorithm increases the performance of the multiple alignments programs by at least 4%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Madera, M., Gough, J.: A comparison of profile hidden markov model procedures for remote homology detection. Nucleic Acids Research 30, 4321–4328 (2002)
Article Google Scholar
Bourne, P., Weissig, H. (eds.): Structural Bioinformatics. Wiley-Liss, Hoboken (2003)
Google Scholar
Leslie, C.S., Eskin, E., Cohen, A., Weston, J., Noble, W.S.: Mismatch string kernels for discriminative protein classification. Bioinformatics 20, 467–476 (2004)
Article Google Scholar
Jaakkola, T., Diekhans, M., Haussler, D.: A discriminative framework for detecting remote protein homologies. Journal of Computational Biology 7, 95–114 (2000)
Article Google Scholar
Liao, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. Journal of Computational Biology 10, 857–868 (2003)
Article Google Scholar
Chakrabarti, S., Lanczycki, C.J., Panchenko, A.R., Przytycka, T.M., Thiessen, P.A., Bryant, S.H.: Refining multiple sequence alignments with conserved core regions. Nucleic Acids Research 34, 2598–2606 (2006)
Article Google Scholar
Edgar, R.C., Batzoglou, S.: Multiple sequence alignment. Current Opinion in Structural Biology 16, 368–373 (2006)
Article Google Scholar
Pei, J., Grishin, N.V.: MUMMALS: Multiple sequence alignment improved by using hidden markov models with local structural information. Nucleic Acids Research 34, 4364–4374 (2006)
Article Google Scholar
Subramanian, A., Kaufmann, M., Morgenstern, B.: DIALIGN-TX: Greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms for Molecular Biology 3, 6–17 (2008)
Article Google Scholar
Bray, N., Pachter, L.: MAVID: Constrained ancestral alignment of multiple sequences. Genome Research 14, 693–699 (2004)
Article Google Scholar
Suchard, M.A., Redelings, B.D.: BAli-Phy: Simultaneous bayesian inference of alignment and phylogeny. Bioinformatics 22, 2047–2048 (2006)
Article Google Scholar
Sheinerman, F.B., Al-Lazikani, B., Honig, B.: Sequence, structure and energetic determinants of phosphopeptide selectivity of SH2 domains. Journal of Molecular Biology 334, 823–841 (2003)
Article Google Scholar
Al-Lazikani, B., Sheinerman, F.B., Honig, B.: Combining multiple structure and sequence alignments to improve sequence detection and alignment: application to the SH2 domains of Janus kinases. PNAS 98, 14796–14801 (2001)
Article Google Scholar
Oldfield, T.: CAALIGN: A program for pairwise and multiple protein-structure alignment. Acta Crystallographica Section D 63, 514–525 (2007)
Article Google Scholar
Birzele, F., Gewehr, J.E., Csaba, G., Zimmer, R.: Vorolign-fast structural alignment using voronoi contacts. Bioinformatics 23, e205–211 (2007)
Article Google Scholar
Menke, M., Berger, B., Cowen, L.: Matt: local flexibility aids protein multiple structure alignment. PLoS Computational Biology 4, e10 (2008)
Article MathSciNet Google Scholar
Ye, Y., Godzik, A.: Multiple flexible structure alignment using partial order graphs. Bioinformatics 21, 2362–2369 (2005)
Article Google Scholar
Dai, J., Cheng, J.: HMMEditor: A visual editing tool for profile hidden markov model. BMC Genomics 9, S8 (2008)
Article Google Scholar
Madera, M.: Profile Comparer: A program for scoring and aligning profile hidden markov models. Bioinformatics 24, 2630–2631 (2008)
Article Google Scholar
Grundy, W.N., Bailey, T.L., Elkan, C.P., Baker, M.E.: Meta-MEME: Motif-based hidden markov models of protein families. Computer Applications in the Biosciences 13, 397–406 (1997)
Google Scholar
Birney, E., Clamp, M., Durbin, R.: GeneWise and Genomewise. Genome Research 14, 988–995 (2004)
Article Google Scholar
Pavlidis, P., Wapinski, I., Noble, W.S.: Support vector machine classification on the web. Bioinformatics 20, 586–587 (2004)
Article Google Scholar
Pirooznia, M., Deng, Y.: SVM Classifier - A comprehensive java interface for support vector machine classification of microarray data. BMC Bioinformatics 7, S25 (2006)
Article Google Scholar
Cai, C.Z., Han, L.Y., Ji, Z.L., Chen, X., Chen, Y.Z.: SVM-Prot: Web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acids Research 31, 3692–3697 (2003)
Article Google Scholar
Melvin, I., Ie, E., Kuang, R., Weston, J., Noble, W., Leslie, C.: SVM-Fold: A tool for discriminative multi-class protein fold and superfamily recognition. BMC Bioinformatics 8, S2 (2007)
Article Google Scholar
Manohar, A., Batzoglou, S.: TreeRefiner: A tool for refining a multiple alignment on a phylogenetic tree. In: Proceeding of the 4th International IEEE Computer Society Computational Systems Bioinformatics Conference, pp. 111–119 (2005)
Google Scholar
Notredame, C., Holm, L., Higgins, D.G.: COFFEE: An objective function for multiple sequence alignments. Bioinformatics 14, 407–422 (1998)
Article Google Scholar
Edgar, R.: MUSCLE: A multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113–132 (2004)
Article Google Scholar
Wallace, I.M., O’Sullivan, O., Higgins, D.G.: Evaluation of iterative alignment algorithms for multiple alignment. Bioinformatics 21, 1408–1414 (2005)
Article Google Scholar
Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., et al.: Clustal W and Clustal X version 2.0. Bioinformatics 23, 2947–2948 (2007)
Article Google Scholar
Katoh, K., Misawa, K., Kuma, K., Miyata, T.: MAFFT: A novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research 30, 3059–3066 (2002)
Article Google Scholar
Do, C.B., Mahabhashyam, M.S.P., Brudno, M., Batzoglou, S.: PROBCONS: Probabilistic consistency-based multiple sequence alignment. Genome Research 15, 330–340 (2005)
Article Google Scholar
Notredame, C., Higgins, D.G., Heringa, J.: T-Coffee: A novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology 302, 205–217 (2000)
Article Google Scholar
O’Sullivan, O., Suhre, K., Abergel, C., Higgins, D.G., Notredame, C.: 3DCoffee: Combining protein sequences and structures within multiple sequence alignments. Journal of Molecular Biology 340, 385–395 (2004)
Article Google Scholar
Lupyan, D., Leo-Macias, A., Ortiz, A.R.: A new progressive-iterative algorithm for multiple structure alignment. Bioinformatics 21, 3255–3263 (2005)
Article Google Scholar
Konagurthu, A.S., Whisstock, J.C., Stuckey, P.J., Lesk, A.M.: MUSTANG: A multiple structural alignment algorithm. Protein Science 64, 559–574 (2006)
Article Google Scholar
Kann, M.G., Thiessen, P.A., Panchenko, A.R., Schaffer, A.A., Altschul, S.F., Bryant, S.H.: A structure-based method for protein sequence alignment. Bioinformatics 21, 1451–1456 (2005)
Article Google Scholar
Eddy, S.R.: Profile hidden Markov models. Bioinformatics 14, 755–763 (1998)
Article Google Scholar
Karplus, K., Barrett, C., Hughey, R.: Hidden Markov Models for Detecting Remote Protein Homologies. Bioinformatics 14, 846–856 (1998)
Article Google Scholar
Rangwala, H., Karypis, G.: Profile-based Direct Kernels for Remote Homology Detection and Fold Recognition. Bioinformatics 21, 4239–4247 (2005)
Article Google Scholar
Melvin, I., Ie, E., Kuang, R., Weston, J., Noble, W., Leslie, C.: SVM-Fold: A tool for discriminative multi-class protein fold and superfamily recognition. BMC Bioinformatics 8, 2 (2007)
Article Google Scholar
Bernardes, J., Davila, A., Costa, V., Zaverucha, G.: Improving Model Construction of Profile HMMs for Remote Homology Detection Through Structural Alignment. BMC Bioinformatics 8, 435–447 (2007)
Article Google Scholar
Chakrabarti, S., Lanczycki, C.J., Panchenko, A.R., Przytycka, T.M., Thiessen, P.A., Bryant, S.H.: Refining multiple sequence alignments with conserved core regions. Nucleic Acids Research 34, 2598–2606 (2006)
Article Google Scholar
Marchler-Bauer, A., Anderson, J.B., Chitsaz, F., Derbyshire, M.K., DeWeese-Scott, C., Fong, J.H., Geer, L.Y., Geer, R.C., Gonzales, N.R., Gwadz, M., et al.: CDD: specific functional annotation with the Conserved Domain Database. Nucleic Acids Research 37, D205–210 (2009)
Article Google Scholar
Finn, R.D., Tate, J., Mistry, J., Coggill, P.C., Sammut, S.J., Hotz, H.-R., Ceric, G., Forslund, K., Eddy, S.R., Sonnhammer, E.L.L., Bateman, A.: The Pfam protein families database. Nucleic Acids Research 36, D281–288 (2008)
Article Google Scholar
Andreeva, A., Howorth, D., Brenner, S.E., Hubbard, T.J.P., Chothia, C., Murzin, A.G.: SCOP database in 2004: Refinements integrate structure and sequence family data. Nucleic Acids Research 32, D226–229 (2004)
Article Google Scholar
Sonego, P., Kocsor, A., Pongor, S.: ROC analysis: Applications to the classification of biological sequences and 3D structures. Briefings in Bioinformatics 9, 198–209 (2008)
Article Google Scholar
Supper, J., Spangenberg, L., Planatscher, H., Draeger, A., Schroeder, A., Zell, A.: BowTieBuilder: modeling signal transduction pathways. BMC Systems Biology 3, 67 (2009)
Article Google Scholar
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Research 28, 235–242 (2000)
Article Google Scholar
Katoh, K., Kuma, K., Toh, H., Miyata, T.: MAFFT Version 5: Improvement in Accuracy of Multiple Sequence Alignment. Nucleic Acids Research 33, 511–518 (2005)
Article Google Scholar
Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proceeding of the National Academy of Sciences of the United States of America 89, 10915–10919 (1992)
Article Google Scholar
Taylor, W.R., Orengo, C.A.: Protein Structure Alignment. Journal of Molecular Biology 208, 1–22 (1989)
Article Google Scholar
Shia, J., Blundella, T.L., Mizuguchia, K.: FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. Journal of Molecular Biology 310, 243–257 (2000)
Article Google Scholar
Gribskov, M., Robinson, N.L.: Use of Receiver Operating Characteristic (ROC) Analysis to Evaluate Sequence Matching. Computers & Chemistry 20, 25–33 (1996)
Article Google Scholar
Kedem, K., Chew, L.P., Elber, R.: Unit-vector RMS (URMS) as a tool to analyze molecular dynamics trajectories. Proteins 37, 554–564 (1999)
Article Google Scholar
Pei, J., Grishin, N.V.: PROMALS: towards accurate multiple sequence alignments of distantly related proteins. Bioinformatics 23, 802–808 (2007)
Article Google Scholar
Wang, Q., Song, E., Jin, R., Han, P., Wang, X., Zhou, Y., Zeng, J.: Segmentation of lung nodules in computed tomography images using dynamic programming and multidirection fusion techniques. Academic Radiology 16, 678–688 (2009)
Article Google Scholar
Sato, K., Morita, K., Sakakibara, Y.: PSSMTS: position specific scoring matrices on tree structures. Journal of Mathematical Biology 56, 201–214 (2008)
Article MathSciNet MATH Google Scholar
Neuwald, A.F., Poleksic, A.: PSI-BLAST searches using hidden Markov models of structural repeats: prediction of an unusual sliding DNA clamp and of ß-propellers in UV-damaged DNA-binding protein. Nucleic Acids Research 28, 3570–3580 (2000)
Article Google Scholar
Ng, A.Y., Jordan, M.I.: On Discriminative vs Generative Classification algorithm: A Comparison of Logistic Regression and Naive Bayes. In: Dietterich, T., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems (NIPS), vol. 14, pp. 841–848. MIT Press, Vancouver (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory of Computational Intelligence and Biotechnology, Universiti Teknologi Malaysia, 81310, UTM Skudai, Malaysia
Firdaus M. Abdullah & Razib M. Othman
Department of Web Technology, Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, 86400, Parit Raja, Batu Pahat, Malaysia
Shahreen Kasim & Rathiah Hashim

Authors

Firdaus M. Abdullah
View author publications
You can also search for this author in PubMed Google Scholar
Razib M. Othman
View author publications
You can also search for this author in PubMed Google Scholar
Shahreen Kasim
View author publications
You can also search for this author in PubMed Google Scholar
Rathiah Hashim
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Multimedia Engineering Department, Hannam University, 133 Ojeong-dong, Daeduk-gu, Daejeon, Korea
Tai-hoon Kim , Rosslin John Robles & Maricel Balitanas , &
The Ohio State University, 470 Hitchcock Hall, 2070 Neil Avenue, 43210-1275, Columbus, OH, USA
Hojjat Adeli

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abdullah, F.M., Othman, R.M., Kasim, S., Hashim, R. (2011). An Optimal Mesh Algorithm for Remote Protein Homology Detection. In: Kim, Th., Adeli, H., Robles, R.J., Balitanas, M. (eds) Ubiquitous Computing and Multimedia Applications. UCMA 2011. Communications in Computer and Information Science, vol 151. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20998-7_57

Download citation

DOI: https://doi.org/10.1007/978-3-642-20998-7_57
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20997-0
Online ISBN: 978-3-642-20998-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics