Skip to main content

Discriminative Remote Homology Detection Using Maximal Unique Sequence Matches

  • Conference paper
Advances in Intelligent Data Analysis VI (IDA 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3646))

Included in the following conference series:

Abstract

We define a new pairwise sequence comparison scheme for distantly related proteins and report its performance on remote homology detection task. The new scheme compares two protein sequences by using the maximal unique matches (MUM) between them. Once identified, the length of all non-overlapping MUMs is used to define the similarity between two sequences. To detect the homology of a protein to a protein family, we utilize the feature vectors containing all pairwise similarity scores between the test protein and the proteins in the training set. Support vector machines are employed for the binary classification in the same way that the recent works have done. The new method is shown to be more accurate than the recent methods including SVM-Fisher and SVM-BLAST, and competitive with SVM-Pairwise. In terms of computational efficiency, the new method performs much better than SVM-Pairwise.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altschul, S., Gish, W., Miller, W., Myers, E.W., Lipman, D.: A basic local alignment search tool. Journal of Molecular Biology 251, 403–410 (1990)

    Google Scholar 

  2. Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25, 3389–3402 (1997)

    Article  Google Scholar 

  3. Delcher, A., Kasif, S., Fleishmann, R., Peterson, J., White, O., Salzberg, S.: Alignment of whole genomes. Nucleic Acids Research 27, 2369–2376 (1999)

    Article  Google Scholar 

  4. Gribskov, M., Robinson, N.L.: Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching. Computers and Chemistry 20(1), 25–33 (1996)

    Article  Google Scholar 

  5. Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer science and Computational Biology. Cambridge University Press, New York (1997)

    Book  MATH  Google Scholar 

  6. Jaakola, T., Diekhans, M., Haussler, D.: A discriminative framework for detecting remote protein homologies. Journal of Computational Biology 7, 95–114 (2000)

    Article  Google Scholar 

  7. Karplus, K., Barrett, C., Hughey, R.: Hidden Markov Models for detecting remote protein homologies. Bioinformatics 14, 846–856 (1998)

    Article  Google Scholar 

  8. Liao, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. Journal of Computational Biology 10, 857–868 (2002)

    Article  Google Scholar 

  9. Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology 247, 536–540 (1995)

    Google Scholar 

  10. Park, J., Karplus, K., Barrett, C., Hughey, R., Haussler, D., Hubbard, T., Chothia, C.: Sequence comparisons using multiple sequences detect tree times as many remote homologues as pairwise methods. Journal of Molecular Biology 284, 1201–1210 (1998)

    Article  Google Scholar 

  11. Rost, B.: Twilight zone of protein sequence alignments. Protein engineering 12, 85–94 (1999)

    Article  Google Scholar 

  12. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)

    Article  Google Scholar 

  13. Ukkonen, E.: On-line construction of suffix-trees. Algorithmica 14, 249–260 (1995)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Oğul, H., Mumcuoğlu, Ü.E. (2005). Discriminative Remote Homology Detection Using Maximal Unique Sequence Matches. In: Famili, A.F., Kok, J.N., Peña, J.M., Siebes, A., Feelders, A. (eds) Advances in Intelligent Data Analysis VI. IDA 2005. Lecture Notes in Computer Science, vol 3646. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11552253_26

Download citation

  • DOI: https://doi.org/10.1007/11552253_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28795-7

  • Online ISBN: 978-3-540-31926-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics