Skip to main content

A Self-training with Active Example Selection Criterion for Biomedical Named Entity Recognition

  • Conference paper
Convergence and Hybrid Information Technology (ICHIT 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7425))

Included in the following conference series:

Abstract

Biomedical named entity recognition is an essential prerequisite task before effective text mining of biomedical literature can begin. Exploiting unlabeled text data with a relatively small labeled corpus to build an accurate classification model has been an active and challenging research topic in text mining, due to the recent growth of the amount of biomedical literature. In this work, we proposed a new semi-supervised learning method based on self-training for biomedical named entity recognition. In this method, one classifier iteratively labels informative examples queried from the unlabeled data and learns on the most confident ones of them. Performance of the classifier is therefore improved. The proposed method outperforms the traditional self-training algorithm in terms of f-measure as well as, the number of training iterations performed to build a good classification model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dai, H., Chang, Y., Tsai, R.T., Hsu, W.: New Challenges for Biological Text-Mining in the Next Decade. Journal of Computer Science and Technology 25(1), 169 (2010)

    Article  Google Scholar 

  2. Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the Association for Computational Linguistics (ACL), pp. 189–196. ACL Press (1995)

    Google Scholar 

  3. Munkhdalai, T., Li, M., Kim, T., Namsrai, O., Jeong, S., Shin, J., Ryu, K.H.: Bio Named Entity Recognition based on Co-training Algorithm. In: MAW 2012 (2012)

    Google Scholar 

  4. Settles, B.: Active learning literature survey. Univ. of Wisconsin-Madison, Madison, WI, Computer Sciences Tech., Rep.1648 (2010)

    Google Scholar 

  5. Lewis, D., Gale, W.: A sequential algorithm for training text classifiers. In: Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 3–12. ACM/Springer (1994)

    Google Scholar 

  6. Zhao, S.: Named Entity Recognition in Biomedical Texts using an HMM Model. In: Proc. of NLPBA (2004)

    Google Scholar 

  7. GuoDong, Z., Jian, S., Collier, N., Ruch, P., Nazarenko, A.: Exploring Deep Knowledge Resources in Biomedical Name Recognition. In: Proc. of NLPBA/BioNLP, pp. 99–102 (2004)

    Google Scholar 

  8. Park, K.M., Kim, S.H., Lee, D.G., Rim, H.C.: Boosting Lexical Knowledge for Biomedical Named Entity Recognition. In: Proc. of JNLPBA 2004 (2004)

    Google Scholar 

  9. Mitsumori, T., Fation, S., Murata, M., Doi, K., Doi, H.: Gene/protein name recognition based on support vector machine using dictionary as features. BMC Bioinformatics (2005)

    Google Scholar 

  10. Collier, N., Takeuchi, K.: Comparison of character-level and part of speech features for name recognition in biomedical texts. Journal of Biomedical Informatics, 423–435 (2004)

    Google Scholar 

  11. Ju, Z., Wang, J., Zhu, F.: Named Entity Recognition From Biomedical Text Using SVM. Bioinformatics and Biomedical Engineering (2011)

    Google Scholar 

  12. Finkel, J., Dingare, S., Nguyen, H., Nissim, M., Manning, C., Sinclair, G.: Exploiting Context for Biomedical Entity Recognition: From Syntax to the Web. In: Joint Workshop on Natural Language Processing in Biomedicine and Its Applications at Coling (2004)

    Google Scholar 

  13. Settles, B.: Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets. In: The Proc. of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications (2004)

    Google Scholar 

  14. Chan, S., Lam, W.: Efficient Methods for Biomedical Named Entity Recognition. Bioinformatics and Bioengineering (2007)

    Google Scholar 

  15. Hsu, C., Chang, Y., Kuo, C., Lin, Y., Huang, H., Chung, I.: Integrating high dimensional bi-directional parsing models for gene mention tagging. Bioinformatics (2008)

    Google Scholar 

  16. Li, Y., Lin, H., Yang, Z.: Integrating rich background knowledge for gene named entity classification and recognition. BMC Bioinformatics (2009)

    Google Scholar 

  17. Yang, L., Zhou, Y.: Two-phase Biomedical Named Entity Recognition based on Semi-CRFs. Bio-inspired Computing: Theories and Applications (2010)

    Google Scholar 

  18. Munkhdalai, T., Li, M., Namsrai, E., Namsrai, O., Ruy, K.H.: BFSM: Finite State Machine Learned as Name Boundary Definer for Bio Named Entity Recognition. In: ICAST 2011 (2011)

    Google Scholar 

  19. Tanable, L., Wilbur, J.: Tagging Gene and Protein names in Full Text articles. In: Workshop on Natural Language Processing in the Biomedical Domain (2002)

    Google Scholar 

  20. Kim, J.D., Ohta, T., Tateishi, Y., Tsujii, J.: GENIA corpus-a semantically annotated corpus for bio-text mining. Bioinformatics 19(suppl. 1), 18–22 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shin, E., Munkhdalai, T., Li, M., Paik, I., Ryu, K.H. (2012). A Self-training with Active Example Selection Criterion for Biomedical Named Entity Recognition. In: Lee, G., Howard, D., Kang, J.J., Ślęzak, D. (eds) Convergence and Hybrid Information Technology. ICHIT 2012. Lecture Notes in Computer Science, vol 7425. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32645-5_61

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32645-5_61

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32644-8

  • Online ISBN: 978-3-642-32645-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics