Skip to main content

Advertisement

Log in

A Unified Active Learning Framework for Biomedical Relation Extraction

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Supervised machine learning methods have been employed with great success in the task of biomedical relation extraction. However, existing methods are not practical enough, since manual construction of large training data is very expensive. Therefore, active learning is urgently needed for designing practical relation extraction methods with little human effort. In this paper, we describe a unified active learning framework. Particularly, our framework systematically addresses some practical issues during active learning process, including a strategy for selecting informative data, a data diversity selection algorithm, an active feature acquisition method, and an informative feature selection algorithm, in order to meet the challenges due to the immense amount of complex and diverse biomedical text. The framework is evaluated on protein-protein interaction (PPI) extraction and is shown to achieve promising results with a significant reduction in editorial effort and labeling time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Faro A, Giordano D, Spampinato C (2012) Combining literature text mining with microarray data: Advances for system biology modeling. Brief Bioinform 13(1):61–82

    Article  Google Scholar 

  2. Hunter L, Cohen K (2006) Biomedical language processing: What’s beyond PubMed? Mol Cell 21(5):589–594

    Article  Google Scholar 

  3. Huang M, Ding S, Wang H, Zhu X (2008) Mining physical protein-protein interactions from the literature. Genome Biology 9(Suppl 2):S12

    Article  Google Scholar 

  4. Katrenko S, Adriaans P. Learning relations from biomedical corpora using dependency trees. In Lecture Notes in Computer Science, Tuyls K, Westra R, Saeys T et al. (eds.), Springer-Verlag, 2007, 4366, pp.61–80.

  5. Miwa M, Sætre R, Miyao Y, Tsujii J. A rich feature vector for protein-protein interaction extraction from multiple corpora. In Proc. the Conference on Empirical Methods in Natural Language Processing, August 2009, pp.121–130.

  6. Yang Z, Lin H, Li Y (2010) BioPPISVMExtractor: A protein-protein interaction extractor for biomedical literature using SVM and rich feature sets. Journal of Biomedical Informatics 43(1):88–96

    Article  Google Scholar 

  7. Li Y, Hu X, Lin H, Yang Z (2010) Learning an enriched representation from unlabelled data for protein-protein interaction extraction. BMC Bioinformatics 11(Suppl 2):S7

    Article  Google Scholar 

  8. Landeghem S, Abeel T, Saeys Y, Peer Y (2010) Discriminative and informative features for biomolecular text mining with ensemble feature selection. Bioinformatics 26(18):554–560

    Article  Google Scholar 

  9. Bui Q, Katrenko S, Sloot P (2011) A hybrid approach to extract protein-protein interactions. Bioinformatics 27(2):259–265

    Article  Google Scholar 

  10. van Landeghem S, Saeys Y, Deu Baets B, van De Peer Y. Extracting protein-protein interactions from text using rich feature vectors and feature selection. In Proc. the 3th International Symposium on Semantic Mining in Biomedicine, September 2008, pp.77–84.

  11. Fayruzov T, De Cock M, Cornelis C, Hoste V (2009) Linguistic feature analysis for protein interaction extraction. BMC Bioinformatics 10:374

    Article  Google Scholar 

  12. Miyao Y, Sagae K, Sætre R, Matsuzaki T, Tsujii J (2009) Evaluating contributions of natural language parsers to protein-protein interaction extraction. Bioinformatics 25(3):394–400

    Article  Google Scholar 

  13. Niu Y, Otasek D, Jurisica I (2010) Evaluation of linguistic features useful in extraction of interactions from PubMed; Application to annotating known, high-throughput and predicted interactions in I2D. Bioinformatics 26(1):111–119

    Article  Google Scholar 

  14. Erkan G, Ozgur A, Radev D. Semi-supervised classification for extracting protein interaction sentences using dependency parsing. In Proc. the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, June 2007, pp.228–237.

  15. Kim S, Yoon J, Yang J (2008) Kernel approaches for genic interaction extraction. Bioinformatics 24(1):118–126

    Article  Google Scholar 

  16. Airola A, Pyysalo S, Björne J, Pahikkala T, Ginter F, Salakoski T (2008) All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning. BMC Bioinformatics 9(Suppl 11):S2

    Article  Google Scholar 

  17. Segura-Bedmar I, Martínez P, de Pablo-Sánchez C (2011) Using a shallow linguistic kernel for drug-drug interaction extraction. J Biomed Inform 44(5):789–804

    Article  Google Scholar 

  18. Burr S. Active learning literature survey. Technical Report, University of Wisconsin-Madison. 2009.

  19. Dai H, Chang Y, Tsai RT, Hsu W (2010) New challenges for biological text-mining in the next decade. J Comput Sci Technol 25(1):169–179

    Article  Google Scholar 

  20. Wang M, Hua X. Active learning in multimedia annotation and retrieval: A survey. ACM Transactions on Intelligent Systems and Technology, 2011, 2(2), Article No. 10.

  21. Long B, Chapelle O, Zhang Y, Chang Y, Zheng Z, Tseng B. Active learning for ranking through expected loss optimization. In Proc. the 33rd Intarnational Conference on Research and Development in Information Retrieval, July 2010, pp.267–274.

  22. He X (2010) Laplacian regularized d-optimal design for active learning and its application to image retrieval. IEEE Transactions on Image Processing 19(1):254–263

    Article  MathSciNet  Google Scholar 

  23. Bloodgood M, Callison-Burch C. Bucking the trend: Large-scale cost-focused active learning for statistical machine translation. In Proc. the 48th Annual Meeting of the Association for Computational Linguistics, July 2010, pp.854–864.

  24. Mohamed T, Carbonell J, Ganapathiraju M (2010) Active learning for human protein-protein interaction prediction. BMC Bioinformatics 11(Suppl 1):S57

    Article  Google Scholar 

  25. Klaus B. Incorporating diversity in active learning with support vector machines. In Proc. the 20th International Conference on Machine Learning, August 2003, pp.59–66.

  26. Huang M, Zhu X, Hao Y, Payan D, Qu K, Li M (2004) Discovering patterns to extract protein-protein interactions from full texts. Bioinformatics 20(18):3604–3612

    Article  Google Scholar 

  27. Wu F, Weld D. Open information extraction using wikipedia. In Proc. the 48th ACL, 2010, pp.118–127.

  28. Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research 5:1205–1224

    MATH  Google Scholar 

  29. Riloff E. Automatically generating extraction patterns from untagged text. In Proc. the 13th National Conference on Artificial Intelligence, August 1996, pp.1044–1049.

  30. Quinlan J. Unknown attribute values in induction. In Proc. the 6th Int. Workshop on Machine Learning, June 1989, pp.164–168.

  31. Zhang H, Huang M, Zhu X. Protein-protein interaction extraction from bio-literature with compact features and data sampling strategy. In Proc. the 4th BMEI, October 2011, pp.1779–1783.

  32. Pyysalo S, Airola A, Heimonen J et al (2008) Comparative analysis of five protein-protein interaction corpora. BMC Bioinformatics 9(Suppl 3):S6

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong-Tao Zhang.

Additional information

The work is supported by the National Natural Science Foundation of China under Grant No. 60973104 and the National Basic Research 973 Program of China under Grant No. 2012CB316301.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, HT., Huang, ML. & Zhu, XY. A Unified Active Learning Framework for Biomedical Relation Extraction. J. Comput. Sci. Technol. 27, 1302–1313 (2012). https://doi.org/10.1007/s11390-012-1306-0

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-012-1306-0

Keywords

Navigation