Abstract
We define a new pairwise sequence comparison scheme for distantly related proteins and report its performance on remote homology detection task. The new scheme compares two protein sequences by using the maximal unique matches (MUM) between them. Once identified, the length of all non-overlapping MUMs is used to define the similarity between two sequences. To detect the homology of a protein to a protein family, we utilize the feature vectors containing all pairwise similarity scores between the test protein and the proteins in the training set. Support vector machines are employed for the binary classification in the same way that the recent works have done. The new method is shown to be more accurate than the recent methods including SVM-Fisher and SVM-BLAST, and competitive with SVM-Pairwise. In terms of computational efficiency, the new method performs much better than SVM-Pairwise.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Altschul, S., Gish, W., Miller, W., Myers, E.W., Lipman, D.: A basic local alignment search tool. Journal of Molecular Biology 251, 403–410 (1990)
Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25, 3389–3402 (1997)
Delcher, A., Kasif, S., Fleishmann, R., Peterson, J., White, O., Salzberg, S.: Alignment of whole genomes. Nucleic Acids Research 27, 2369–2376 (1999)
Gribskov, M., Robinson, N.L.: Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching. Computers and Chemistry 20(1), 25–33 (1996)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer science and Computational Biology. Cambridge University Press, New York (1997)
Jaakola, T., Diekhans, M., Haussler, D.: A discriminative framework for detecting remote protein homologies. Journal of Computational Biology 7, 95–114 (2000)
Karplus, K., Barrett, C., Hughey, R.: Hidden Markov Models for detecting remote protein homologies. Bioinformatics 14, 846–856 (1998)
Liao, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. Journal of Computational Biology 10, 857–868 (2002)
Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology 247, 536–540 (1995)
Park, J., Karplus, K., Barrett, C., Hughey, R., Haussler, D., Hubbard, T., Chothia, C.: Sequence comparisons using multiple sequences detect tree times as many remote homologues as pairwise methods. Journal of Molecular Biology 284, 1201–1210 (1998)
Rost, B.: Twilight zone of protein sequence alignments. Protein engineering 12, 85–94 (1999)
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)
Ukkonen, E.: On-line construction of suffix-trees. Algorithmica 14, 249–260 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Oğul, H., Mumcuoğlu, Ü.E. (2005). Discriminative Remote Homology Detection Using Maximal Unique Sequence Matches. In: Famili, A.F., Kok, J.N., Peña, J.M., Siebes, A., Feelders, A. (eds) Advances in Intelligent Data Analysis VI. IDA 2005. Lecture Notes in Computer Science, vol 3646. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11552253_26
Download citation
DOI: https://doi.org/10.1007/11552253_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28795-7
Online ISBN: 978-3-540-31926-9
eBook Packages: Computer ScienceComputer Science (R0)