A Self-training with Active Example Selection Criterion for Biomedical Named Entity Recognition

Shin, Eonseok; Munkhdalai, Tsendsuren; Li, Meijing; Paik, Incheon; Ryu, Keun Ho

doi:10.1007/978-3-642-32645-5_61

Eonseok Shin²⁰,
Tsendsuren Munkhdalai²¹,
Meijing Li²¹,
Incheon Paik²² &
…
Keun Ho Ryu²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7425))

Included in the following conference series:

International Conference on Hybrid Information Technology

2389 Accesses
1 Citations

Abstract

Biomedical named entity recognition is an essential prerequisite task before effective text mining of biomedical literature can begin. Exploiting unlabeled text data with a relatively small labeled corpus to build an accurate classification model has been an active and challenging research topic in text mining, due to the recent growth of the amount of biomedical literature. In this work, we proposed a new semi-supervised learning method based on self-training for biomedical named entity recognition. In this method, one classifier iteratively labels informative examples queried from the unlabeled data and learns on the most confident ones of them. Performance of the classifier is therefore improved. The proposed method outperforms the traditional self-training algorithm in terms of f-measure as well as, the number of training iterations performed to build a good classification model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Dai, H., Chang, Y., Tsai, R.T., Hsu, W.: New Challenges for Biological Text-Mining in the Next Decade. Journal of Computer Science and Technology 25(1), 169 (2010)
Article Google Scholar
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the Association for Computational Linguistics (ACL), pp. 189–196. ACL Press (1995)
Google Scholar
Munkhdalai, T., Li, M., Kim, T., Namsrai, O., Jeong, S., Shin, J., Ryu, K.H.: Bio Named Entity Recognition based on Co-training Algorithm. In: MAW 2012 (2012)
Google Scholar
Settles, B.: Active learning literature survey. Univ. of Wisconsin-Madison, Madison, WI, Computer Sciences Tech., Rep.1648 (2010)
Google Scholar
Lewis, D., Gale, W.: A sequential algorithm for training text classifiers. In: Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 3–12. ACM/Springer (1994)
Google Scholar
Zhao, S.: Named Entity Recognition in Biomedical Texts using an HMM Model. In: Proc. of NLPBA (2004)
Google Scholar
GuoDong, Z., Jian, S., Collier, N., Ruch, P., Nazarenko, A.: Exploring Deep Knowledge Resources in Biomedical Name Recognition. In: Proc. of NLPBA/BioNLP, pp. 99–102 (2004)
Google Scholar
Park, K.M., Kim, S.H., Lee, D.G., Rim, H.C.: Boosting Lexical Knowledge for Biomedical Named Entity Recognition. In: Proc. of JNLPBA 2004 (2004)
Google Scholar
Mitsumori, T., Fation, S., Murata, M., Doi, K., Doi, H.: Gene/protein name recognition based on support vector machine using dictionary as features. BMC Bioinformatics (2005)
Google Scholar
Collier, N., Takeuchi, K.: Comparison of character-level and part of speech features for name recognition in biomedical texts. Journal of Biomedical Informatics, 423–435 (2004)
Google Scholar
Ju, Z., Wang, J., Zhu, F.: Named Entity Recognition From Biomedical Text Using SVM. Bioinformatics and Biomedical Engineering (2011)
Google Scholar
Finkel, J., Dingare, S., Nguyen, H., Nissim, M., Manning, C., Sinclair, G.: Exploiting Context for Biomedical Entity Recognition: From Syntax to the Web. In: Joint Workshop on Natural Language Processing in Biomedicine and Its Applications at Coling (2004)
Google Scholar
Settles, B.: Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets. In: The Proc. of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications (2004)
Google Scholar
Chan, S., Lam, W.: Efficient Methods for Biomedical Named Entity Recognition. Bioinformatics and Bioengineering (2007)
Google Scholar
Hsu, C., Chang, Y., Kuo, C., Lin, Y., Huang, H., Chung, I.: Integrating high dimensional bi-directional parsing models for gene mention tagging. Bioinformatics (2008)
Google Scholar
Li, Y., Lin, H., Yang, Z.: Integrating rich background knowledge for gene named entity classification and recognition. BMC Bioinformatics (2009)
Google Scholar
Yang, L., Zhou, Y.: Two-phase Biomedical Named Entity Recognition based on Semi-CRFs. Bio-inspired Computing: Theories and Applications (2010)
Google Scholar
Munkhdalai, T., Li, M., Namsrai, E., Namsrai, O., Ruy, K.H.: BFSM: Finite State Machine Learned as Name Boundary Definer for Bio Named Entity Recognition. In: ICAST 2011 (2011)
Google Scholar
Tanable, L., Wilbur, J.: Tagging Gene and Protein names in Full Text articles. In: Workshop on Natural Language Processing in the Biomedical Domain (2002)
Google Scholar
Kim, J.D., Ohta, T., Tateishi, Y., Tsujii, J.: GENIA corpus-a semantically annotated corpus for bio-text mining. Bioinformatics 19(suppl. 1), 18–22 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Information Technology Planning Department of Korea Army, Korea Army, South Korea
Eonseok Shin
Database/Bioinformatics Laboratory, Chungbuk National University, Cheongju, South Korea
Tsendsuren Munkhdalai, Meijing Li & Keun Ho Ryu
Computer Industry Laboratory, School of Computer Sceince and Engineering, The University of Aizu, Japan
Incheon Paik

Authors

Eonseok Shin
View author publications
You can also search for this author in PubMed Google Scholar
Tsendsuren Munkhdalai
View author publications
You can also search for this author in PubMed Google Scholar
Meijing Li
View author publications
You can also search for this author in PubMed Google Scholar
Incheon Paik
View author publications
You can also search for this author in PubMed Google Scholar
Keun Ho Ryu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept of Computer Engineering, Hannam University, Korea
Geuk Lee
Computer Science and Information System, University of Limerick, Limerick, Ireland
Daniel Howard
Department of Information and Communication, Dong Seoul University, 423 Bokjeong-Dong, Sujeong-Gu, Seongnam, Gyunggi, Korea
Jeong Jin Kang
Institute of Mathematics, University of Warsaw, ul. Banacha 2, 02-097, Warsaw, Poland
Dominik Ślęzak

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shin, E., Munkhdalai, T., Li, M., Paik, I., Ryu, K.H. (2012). A Self-training with Active Example Selection Criterion for Biomedical Named Entity Recognition. In: Lee, G., Howard, D., Kang, J.J., Ślęzak, D. (eds) Convergence and Hybrid Information Technology. ICHIT 2012. Lecture Notes in Computer Science, vol 7425. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32645-5_61

Download citation

DOI: https://doi.org/10.1007/978-3-642-32645-5_61
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32644-8
Online ISBN: 978-3-642-32645-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics