Skip to main content

An Optimal Mesh Algorithm for Remote Protein Homology Detection

  • Conference paper
Ubiquitous Computing and Multimedia Applications (UCMA 2011)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 151))

  • 2498 Accesses

Abstract

Remote protein homology detection is a problem of detecting evolutionary relationship between proteins at low sequence similarity level. Among several problems in remote protein homology detection include the questions of determining which combination of multiple alignment and classification techniques is the best as well as the misalignment of protein sequences during the alignment process. Therefore, this paper deals with remote protein homology detection via assessing the impact of using structural information on protein multiple alignments over sequence information. This paper further presents the best combinations of multiple alignment and classification programs to be chosen. This paper also improves the quality of the multiple alignments via integration of a refinement algorithm. The framework of this paperbegan with datasets preparation on datasets from SCOP version 1.73, followed by multiple alignments of the protein sequences using CLUSTALW, MAFFT, ProbCons and T-Coffee for sequence-based multiple alignments and 3DCoffee, MAMMOTH-mult, MUSTANG and PROMALS3D for structural-based multiple alignments. Next, a refinement algorithm was applied on the protein sequences to reduce misalignments. Lastly, the aligned protein sequences were classified using the pHMMs generative classifier such as HMMER and SAM and also SVMs discriminative classifier such as SVM-Fold and SVM-Struct. The performances of assessed programs were evaluated using ROC, Precision and Recall tests. The result from this paper shows that the combination of refined SVM-Struct and PROMALS3D performs the best against other programs, which suggests that this combination is the best for RPHD. This paper also shows that the use of the refinement algorithm increases the performance of the multiple alignments programs by at least 4%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Madera, M., Gough, J.: A comparison of profile hidden markov model procedures for remote homology detection. Nucleic Acids Research 30, 4321–4328 (2002)

    Article  Google Scholar 

  2. Bourne, P., Weissig, H. (eds.): Structural Bioinformatics. Wiley-Liss, Hoboken (2003)

    Google Scholar 

  3. Leslie, C.S., Eskin, E., Cohen, A., Weston, J., Noble, W.S.: Mismatch string kernels for discriminative protein classification. Bioinformatics 20, 467–476 (2004)

    Article  Google Scholar 

  4. Jaakkola, T., Diekhans, M., Haussler, D.: A discriminative framework for detecting remote protein homologies. Journal of Computational Biology 7, 95–114 (2000)

    Article  Google Scholar 

  5. Liao, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. Journal of Computational Biology 10, 857–868 (2003)

    Article  Google Scholar 

  6. Chakrabarti, S., Lanczycki, C.J., Panchenko, A.R., Przytycka, T.M., Thiessen, P.A., Bryant, S.H.: Refining multiple sequence alignments with conserved core regions. Nucleic Acids Research 34, 2598–2606 (2006)

    Article  Google Scholar 

  7. Edgar, R.C., Batzoglou, S.: Multiple sequence alignment. Current Opinion in Structural Biology 16, 368–373 (2006)

    Article  Google Scholar 

  8. Pei, J., Grishin, N.V.: MUMMALS: Multiple sequence alignment improved by using hidden markov models with local structural information. Nucleic Acids Research 34, 4364–4374 (2006)

    Article  Google Scholar 

  9. Subramanian, A., Kaufmann, M., Morgenstern, B.: DIALIGN-TX: Greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms for Molecular Biology 3, 6–17 (2008)

    Article  Google Scholar 

  10. Bray, N., Pachter, L.: MAVID: Constrained ancestral alignment of multiple sequences. Genome Research 14, 693–699 (2004)

    Article  Google Scholar 

  11. Suchard, M.A., Redelings, B.D.: BAli-Phy: Simultaneous bayesian inference of alignment and phylogeny. Bioinformatics 22, 2047–2048 (2006)

    Article  Google Scholar 

  12. Sheinerman, F.B., Al-Lazikani, B., Honig, B.: Sequence, structure and energetic determinants of phosphopeptide selectivity of SH2 domains. Journal of Molecular Biology 334, 823–841 (2003)

    Article  Google Scholar 

  13. Al-Lazikani, B., Sheinerman, F.B., Honig, B.: Combining multiple structure and sequence alignments to improve sequence detection and alignment: application to the SH2 domains of Janus kinases. PNAS 98, 14796–14801 (2001)

    Article  Google Scholar 

  14. Oldfield, T.: CAALIGN: A program for pairwise and multiple protein-structure alignment. Acta Crystallographica Section D 63, 514–525 (2007)

    Article  Google Scholar 

  15. Birzele, F., Gewehr, J.E., Csaba, G., Zimmer, R.: Vorolign-fast structural alignment using voronoi contacts. Bioinformatics 23, e205–211 (2007)

    Article  Google Scholar 

  16. Menke, M., Berger, B., Cowen, L.: Matt: local flexibility aids protein multiple structure alignment. PLoS Computational Biology 4, e10 (2008)

    Article  MathSciNet  Google Scholar 

  17. Ye, Y., Godzik, A.: Multiple flexible structure alignment using partial order graphs. Bioinformatics 21, 2362–2369 (2005)

    Article  Google Scholar 

  18. Dai, J., Cheng, J.: HMMEditor: A visual editing tool for profile hidden markov model. BMC Genomics 9, S8 (2008)

    Article  Google Scholar 

  19. Madera, M.: Profile Comparer: A program for scoring and aligning profile hidden markov models. Bioinformatics 24, 2630–2631 (2008)

    Article  Google Scholar 

  20. Grundy, W.N., Bailey, T.L., Elkan, C.P., Baker, M.E.: Meta-MEME: Motif-based hidden markov models of protein families. Computer Applications in the Biosciences 13, 397–406 (1997)

    Google Scholar 

  21. Birney, E., Clamp, M., Durbin, R.: GeneWise and Genomewise. Genome Research 14, 988–995 (2004)

    Article  Google Scholar 

  22. Pavlidis, P., Wapinski, I., Noble, W.S.: Support vector machine classification on the web. Bioinformatics 20, 586–587 (2004)

    Article  Google Scholar 

  23. Pirooznia, M., Deng, Y.: SVM Classifier - A comprehensive java interface for support vector machine classification of microarray data. BMC Bioinformatics 7, S25 (2006)

    Article  Google Scholar 

  24. Cai, C.Z., Han, L.Y., Ji, Z.L., Chen, X., Chen, Y.Z.: SVM-Prot: Web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acids Research 31, 3692–3697 (2003)

    Article  Google Scholar 

  25. Melvin, I., Ie, E., Kuang, R., Weston, J., Noble, W., Leslie, C.: SVM-Fold: A tool for discriminative multi-class protein fold and superfamily recognition. BMC Bioinformatics 8, S2 (2007)

    Article  Google Scholar 

  26. Manohar, A., Batzoglou, S.: TreeRefiner: A tool for refining a multiple alignment on a phylogenetic tree. In: Proceeding of the 4th International IEEE Computer Society Computational Systems Bioinformatics Conference, pp. 111–119 (2005)

    Google Scholar 

  27. Notredame, C., Holm, L., Higgins, D.G.: COFFEE: An objective function for multiple sequence alignments. Bioinformatics 14, 407–422 (1998)

    Article  Google Scholar 

  28. Edgar, R.: MUSCLE: A multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113–132 (2004)

    Article  Google Scholar 

  29. Wallace, I.M., O’Sullivan, O., Higgins, D.G.: Evaluation of iterative alignment algorithms for multiple alignment. Bioinformatics 21, 1408–1414 (2005)

    Article  Google Scholar 

  30. Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., et al.: Clustal W and Clustal X version 2.0. Bioinformatics 23, 2947–2948 (2007)

    Article  Google Scholar 

  31. Katoh, K., Misawa, K., Kuma, K., Miyata, T.: MAFFT: A novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research 30, 3059–3066 (2002)

    Article  Google Scholar 

  32. Do, C.B., Mahabhashyam, M.S.P., Brudno, M., Batzoglou, S.: PROBCONS: Probabilistic consistency-based multiple sequence alignment. Genome Research 15, 330–340 (2005)

    Article  Google Scholar 

  33. Notredame, C., Higgins, D.G., Heringa, J.: T-Coffee: A novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology 302, 205–217 (2000)

    Article  Google Scholar 

  34. O’Sullivan, O., Suhre, K., Abergel, C., Higgins, D.G., Notredame, C.: 3DCoffee: Combining protein sequences and structures within multiple sequence alignments. Journal of Molecular Biology 340, 385–395 (2004)

    Article  Google Scholar 

  35. Lupyan, D., Leo-Macias, A., Ortiz, A.R.: A new progressive-iterative algorithm for multiple structure alignment. Bioinformatics 21, 3255–3263 (2005)

    Article  Google Scholar 

  36. Konagurthu, A.S., Whisstock, J.C., Stuckey, P.J., Lesk, A.M.: MUSTANG: A multiple structural alignment algorithm. Protein Science 64, 559–574 (2006)

    Article  Google Scholar 

  37. Kann, M.G., Thiessen, P.A., Panchenko, A.R., Schaffer, A.A., Altschul, S.F., Bryant, S.H.: A structure-based method for protein sequence alignment. Bioinformatics 21, 1451–1456 (2005)

    Article  Google Scholar 

  38. Eddy, S.R.: Profile hidden Markov models. Bioinformatics 14, 755–763 (1998)

    Article  Google Scholar 

  39. Karplus, K., Barrett, C., Hughey, R.: Hidden Markov Models for Detecting Remote Protein Homologies. Bioinformatics 14, 846–856 (1998)

    Article  Google Scholar 

  40. Rangwala, H., Karypis, G.: Profile-based Direct Kernels for Remote Homology Detection and Fold Recognition. Bioinformatics 21, 4239–4247 (2005)

    Article  Google Scholar 

  41. Melvin, I., Ie, E., Kuang, R., Weston, J., Noble, W., Leslie, C.: SVM-Fold: A tool for discriminative multi-class protein fold and superfamily recognition. BMC Bioinformatics 8, 2 (2007)

    Article  Google Scholar 

  42. Bernardes, J., Davila, A., Costa, V., Zaverucha, G.: Improving Model Construction of Profile HMMs for Remote Homology Detection Through Structural Alignment. BMC Bioinformatics 8, 435–447 (2007)

    Article  Google Scholar 

  43. Chakrabarti, S., Lanczycki, C.J., Panchenko, A.R., Przytycka, T.M., Thiessen, P.A., Bryant, S.H.: Refining multiple sequence alignments with conserved core regions. Nucleic Acids Research 34, 2598–2606 (2006)

    Article  Google Scholar 

  44. Marchler-Bauer, A., Anderson, J.B., Chitsaz, F., Derbyshire, M.K., DeWeese-Scott, C., Fong, J.H., Geer, L.Y., Geer, R.C., Gonzales, N.R., Gwadz, M., et al.: CDD: specific functional annotation with the Conserved Domain Database. Nucleic Acids Research 37, D205–210 (2009)

    Article  Google Scholar 

  45. Finn, R.D., Tate, J., Mistry, J., Coggill, P.C., Sammut, S.J., Hotz, H.-R., Ceric, G., Forslund, K., Eddy, S.R., Sonnhammer, E.L.L., Bateman, A.: The Pfam protein families database. Nucleic Acids Research 36, D281–288 (2008)

    Article  Google Scholar 

  46. Andreeva, A., Howorth, D., Brenner, S.E., Hubbard, T.J.P., Chothia, C., Murzin, A.G.: SCOP database in 2004: Refinements integrate structure and sequence family data. Nucleic Acids Research 32, D226–229 (2004)

    Article  Google Scholar 

  47. Sonego, P., Kocsor, A., Pongor, S.: ROC analysis: Applications to the classification of biological sequences and 3D structures. Briefings in Bioinformatics 9, 198–209 (2008)

    Article  Google Scholar 

  48. Supper, J., Spangenberg, L., Planatscher, H., Draeger, A., Schroeder, A., Zell, A.: BowTieBuilder: modeling signal transduction pathways. BMC Systems Biology 3, 67 (2009)

    Article  Google Scholar 

  49. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Research 28, 235–242 (2000)

    Article  Google Scholar 

  50. Katoh, K., Kuma, K., Toh, H., Miyata, T.: MAFFT Version 5: Improvement in Accuracy of Multiple Sequence Alignment. Nucleic Acids Research 33, 511–518 (2005)

    Article  Google Scholar 

  51. Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proceeding of the National Academy of Sciences of the United States of America 89, 10915–10919 (1992)

    Article  Google Scholar 

  52. Taylor, W.R., Orengo, C.A.: Protein Structure Alignment. Journal of Molecular Biology 208, 1–22 (1989)

    Article  Google Scholar 

  53. Shia, J., Blundella, T.L., Mizuguchia, K.: FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. Journal of Molecular Biology 310, 243–257 (2000)

    Article  Google Scholar 

  54. Gribskov, M., Robinson, N.L.: Use of Receiver Operating Characteristic (ROC) Analysis to Evaluate Sequence Matching. Computers & Chemistry 20, 25–33 (1996)

    Article  Google Scholar 

  55. Kedem, K., Chew, L.P., Elber, R.: Unit-vector RMS (URMS) as a tool to analyze molecular dynamics trajectories. Proteins 37, 554–564 (1999)

    Article  Google Scholar 

  56. Pei, J., Grishin, N.V.: PROMALS: towards accurate multiple sequence alignments of distantly related proteins. Bioinformatics 23, 802–808 (2007)

    Article  Google Scholar 

  57. Wang, Q., Song, E., Jin, R., Han, P., Wang, X., Zhou, Y., Zeng, J.: Segmentation of lung nodules in computed tomography images using dynamic programming and multidirection fusion techniques. Academic Radiology 16, 678–688 (2009)

    Article  Google Scholar 

  58. Sato, K., Morita, K., Sakakibara, Y.: PSSMTS: position specific scoring matrices on tree structures. Journal of Mathematical Biology 56, 201–214 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  59. Neuwald, A.F., Poleksic, A.: PSI-BLAST searches using hidden Markov models of structural repeats: prediction of an unusual sliding DNA clamp and of ß-propellers in UV-damaged DNA-binding protein. Nucleic Acids Research 28, 3570–3580 (2000)

    Article  Google Scholar 

  60. Ng, A.Y., Jordan, M.I.: On Discriminative vs Generative Classification algorithm: A Comparison of Logistic Regression and Naive Bayes. In: Dietterich, T., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems (NIPS), vol. 14, pp. 841–848. MIT Press, Vancouver (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Abdullah, F.M., Othman, R.M., Kasim, S., Hashim, R. (2011). An Optimal Mesh Algorithm for Remote Protein Homology Detection. In: Kim, Th., Adeli, H., Robles, R.J., Balitanas, M. (eds) Ubiquitous Computing and Multimedia Applications. UCMA 2011. Communications in Computer and Information Science, vol 151. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20998-7_57

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20998-7_57

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20997-0

  • Online ISBN: 978-3-642-20998-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics