Classification Systems for Bacterial Protein-Protein Interaction Document Retrieval

Classification Systems for Bacterial Protein-Protein Interaction Document Retrieval

Hongfang Liu, Manabu Torii, Guixian Xu, Johannes Goll
Copyright: © 2010 |Volume: 1 |Issue: 1 |Pages: 11
ISSN: 1947-3133|EISSN: 1947-3141|ISSN: 1947-3133|EISBN13: 9781616929725|EISSN: 1947-3141|DOI: 10.4018/jcmam.2010072003
Cite Article Cite Article

MLA

Liu, Hongfang, et al. "Classification Systems for Bacterial Protein-Protein Interaction Document Retrieval." IJCMAM vol.1, no.1 2010: pp.34-44. http://doi.org/10.4018/jcmam.2010072003

APA

Liu, H., Torii, M., Xu, G., & Goll, J. (2010). Classification Systems for Bacterial Protein-Protein Interaction Document Retrieval. International Journal of Computational Models and Algorithms in Medicine (IJCMAM), 1(1), 34-44. http://doi.org/10.4018/jcmam.2010072003

Chicago

Liu, Hongfang, et al. "Classification Systems for Bacterial Protein-Protein Interaction Document Retrieval," International Journal of Computational Models and Algorithms in Medicine (IJCMAM) 1, no.1: 34-44. http://doi.org/10.4018/jcmam.2010072003

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

Protein-protein interaction (PPI) networks are essential to understand the fundamental processes governing cell biology. Recently, studying PPI networks becomes possible due to advances in experimental high-throughput genomics and proteomics technologies. Many interactions from such high-throughput studies and most interactions from small-scale studies are reported only in the scientific literature and thus are not accessible in a readily analyzable format. This has led to the birth of manual curation initiatives such as the International Molecular Exchange Consortium (IMEx). The manual curation of PPI knowledge can be accelerated by text mining systems to retrieve PPI-relevant articles (article retrieval) and extract PPI-relevant knowledge (information extraction). In this article, the authors focus on article retrieval and define the task as binary classification where PPI-relevant articles are positives and the others are negatives. In order to build such classifier, an annotated corpus is needed. It is very expensive to obtain an annotated corpus manually but a noisy and imbalanced annotated corpus can be obtained automatically, where a collection of positive documents can be retrieved from existing PPI knowledge bases and a large number of unlabeled documents (most of them are negatives) can be retrieved from PubMed. They compared the performance of several machine learning algorithms by varying the ratio of the number of positives to the number of unlabeled documents and the number of features used.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.