skip to main content
10.1145/1141277.1141314acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

Protein classification using transductive learning on phylogenetic profiles

Published: 23 April 2006 Publication History

Abstract

Phylogenetic profiles of proteins - strings of ones and zeros encoding respectively the presence and absence of proteins in a group of genomes - have recently been used to identify homologous proteins and/or proteins that are functionally linked, such as participating in a metabolic pathway. We proposed a novel learning method for protein classification based on phylogenetic profiles, which takes into account both the phylogenetic tree structure and the likelihood of proteins presence in genomes. The method consists of a mechanism to extend the profiles with extra bits encoding the phylogenetic tree, whose interior nodes, representing hypothetical ancestral genomes, are scored in a way to reflect their chances of developing divergence in the descendants. The scoring scheme also incorporates the likelihood of proteins presence in genomes as weighting factors, which are collected from the training data initially and integrated as part of kernel of a support vector machine. In a transductive learning scheme, when the SVM is used for classifying test data, the weighting factors are updated iteratively using the predicted results. We tested our method on the proteome of Saccharomyces cerevisiae and used the MIPS classification as a benchmark. The results showed that the classification accuracy was greatly increased.

References

[1]
Altschul, S., Gish, W., Miller, W., Myers, E. and Lipman, D. Basic local alignment search tool. Journal of Molecular Biology, vol. 215, pp. 403--410, 1990.
[2]
Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D. Gapped blast and psi-blast: A new generation of protein database search programs. Nucleic Acids Research vol. 25, pp. 3389--3420, 1997.
[3]
Craig, R. and Liao L. Iterative Weighting of phylogenetic Profiles increases Classification Accuracy. To appear in The Proceedings of International Conference on Machine Learning and Applications. (Los Angeles, California, December, 2005).
[4]
Enright, A. J., IIiopoulos, I., Kyrpides, N. C. and Ouzounis, C. A. Protein interaction maps for complete genome based on gene fusion events. Nature, vol. 403, pp. 86--90, 1999.
[5]
Gribskov, M. and Robinson, N. Use of receiver operating characteristic analysis to evaluate sequence matching. Computers and Chemistry, vol. 10, pp. 25--33, 1996.
[6]
Jaakola, T., Diekhans, M., and Haussler, D. Using the Fisher kernel method to detect remote protein homologies. In Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology. 1999, pp. 95--114.
[7]
Joachims, T. Making large-scale svm learning practical. Advances in kernel Methods -- Support Vector Learning. Scholkopf, B., Burges, C., and Smola A. (eds), MIT Press, 1999. pp. 169--184.
[8]
Joachims, T. Transductive Inference for Text Classification using Support Vector Machines. In Proceedings of the International Conference on Machine Learning (ICML), 1999.
[9]
Leslie, C., Eskin, E., Cohen, A., Weston, J., and Noble, W. Mismatch String Kernels for Discriminative Protein Classification. Bioinformatics 20(4), pp. 467--76, 2004.
[10]
Liao, L. and Noble, W. S. Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. The Journal of Computational Biology, vol. 10, pp. 857--868, 2003.
[11]
Liberles, D. A., Thoren, A., vonHeijne, G., and Elofsson, A. The use of phylogenetic profiles for gene predictions. Current Genomics, vol. 3, pp. 131--137, 2002
[12]
Marcotte, E. M., Pellegrini, M., Thompson, M. J., Yeates, T. O. and Eisenberg, D. A combined algorithm for genomewide prediction of protein function. Nature, vol. 402, pp. 83--86, 1999.
[13]
Mewes, H. W., Frishman, D., Güldener, U., Mannhaupt, G., Mayer, K., Mokrejs, M., Morgenstern, B., Münsterkoetter, M., Rudd, S., and Weil, B. MIPS: a database for genomes and protein sequences. Nucleic Acids Research, vol. 30, pp. 31--34, 2002.
[14]
Narra, K. and Liao L. Use of Extended Phylogenetic Profiles with E-values and Support Vector Machines for Protein Family Classification", International Journal of Computer and Information Science, vol. 6, No. 1, 2005.
[15]
Pavlidis, P., Weston, J., Cai, J., and Grundy, W. N. Gene functional classification from heterogeneous data. In The Proceedings of the Fifth International Conference on Computational Biology, pp. 249--255.
[16]
Pellegrini, M., Marcotte, E. M., Thompson, M. J., Eisenberg, D., and Yeates, T. O. Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. Proc. Natl. Acad. Sci. USA, vol. 96, pp. 4285--4288, 1999.
[17]
Smith, T. F. and Waterman, W. S. Identification of common molecular subsequences. Journal of Molecular Biology, vol. 147, pp. 195--197, 1981.
[18]
Vert, J. P. A tree kernel to analyze phylogenetic profiles. Bioinformatics, vol. 18 pp. S276--S284, 2002.
[19]
V. Vapnik, Statistical Learning Theory, Wiley, 1998.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '06: Proceedings of the 2006 ACM symposium on Applied computing
April 2006
1967 pages
ISBN:1595931082
DOI:10.1145/1141277
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 April 2006

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SAC06
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media