Abstract
Biomedical named entity recognition is an essential prerequisite task before effective text mining of biomedical literature can begin. Exploiting unlabeled text data with a relatively small labeled corpus to build an accurate classification model has been an active and challenging research topic in text mining, due to the recent growth of the amount of biomedical literature. In this work, we proposed a new semi-supervised learning method based on self-training for biomedical named entity recognition. In this method, one classifier iteratively labels informative examples queried from the unlabeled data and learns on the most confident ones of them. Performance of the classifier is therefore improved. The proposed method outperforms the traditional self-training algorithm in terms of f-measure as well as, the number of training iterations performed to build a good classification model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Dai, H., Chang, Y., Tsai, R.T., Hsu, W.: New Challenges for Biological Text-Mining in the Next Decade. Journal of Computer Science and Technology 25(1), 169 (2010)
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the Association for Computational Linguistics (ACL), pp. 189–196. ACL Press (1995)
Munkhdalai, T., Li, M., Kim, T., Namsrai, O., Jeong, S., Shin, J., Ryu, K.H.: Bio Named Entity Recognition based on Co-training Algorithm. In: MAW 2012 (2012)
Settles, B.: Active learning literature survey. Univ. of Wisconsin-Madison, Madison, WI, Computer Sciences Tech., Rep.1648 (2010)
Lewis, D., Gale, W.: A sequential algorithm for training text classifiers. In: Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 3–12. ACM/Springer (1994)
Zhao, S.: Named Entity Recognition in Biomedical Texts using an HMM Model. In: Proc. of NLPBA (2004)
GuoDong, Z., Jian, S., Collier, N., Ruch, P., Nazarenko, A.: Exploring Deep Knowledge Resources in Biomedical Name Recognition. In: Proc. of NLPBA/BioNLP, pp. 99–102 (2004)
Park, K.M., Kim, S.H., Lee, D.G., Rim, H.C.: Boosting Lexical Knowledge for Biomedical Named Entity Recognition. In: Proc. of JNLPBA 2004 (2004)
Mitsumori, T., Fation, S., Murata, M., Doi, K., Doi, H.: Gene/protein name recognition based on support vector machine using dictionary as features. BMC Bioinformatics (2005)
Collier, N., Takeuchi, K.: Comparison of character-level and part of speech features for name recognition in biomedical texts. Journal of Biomedical Informatics, 423–435 (2004)
Ju, Z., Wang, J., Zhu, F.: Named Entity Recognition From Biomedical Text Using SVM. Bioinformatics and Biomedical Engineering (2011)
Finkel, J., Dingare, S., Nguyen, H., Nissim, M., Manning, C., Sinclair, G.: Exploiting Context for Biomedical Entity Recognition: From Syntax to the Web. In: Joint Workshop on Natural Language Processing in Biomedicine and Its Applications at Coling (2004)
Settles, B.: Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets. In: The Proc. of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications (2004)
Chan, S., Lam, W.: Efficient Methods for Biomedical Named Entity Recognition. Bioinformatics and Bioengineering (2007)
Hsu, C., Chang, Y., Kuo, C., Lin, Y., Huang, H., Chung, I.: Integrating high dimensional bi-directional parsing models for gene mention tagging. Bioinformatics (2008)
Li, Y., Lin, H., Yang, Z.: Integrating rich background knowledge for gene named entity classification and recognition. BMC Bioinformatics (2009)
Yang, L., Zhou, Y.: Two-phase Biomedical Named Entity Recognition based on Semi-CRFs. Bio-inspired Computing: Theories and Applications (2010)
Munkhdalai, T., Li, M., Namsrai, E., Namsrai, O., Ruy, K.H.: BFSM: Finite State Machine Learned as Name Boundary Definer for Bio Named Entity Recognition. In: ICAST 2011 (2011)
Tanable, L., Wilbur, J.: Tagging Gene and Protein names in Full Text articles. In: Workshop on Natural Language Processing in the Biomedical Domain (2002)
Kim, J.D., Ohta, T., Tateishi, Y., Tsujii, J.: GENIA corpus-a semantically annotated corpus for bio-text mining. Bioinformatics 19(suppl. 1), 18–22 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shin, E., Munkhdalai, T., Li, M., Paik, I., Ryu, K.H. (2012). A Self-training with Active Example Selection Criterion for Biomedical Named Entity Recognition. In: Lee, G., Howard, D., Kang, J.J., Ślęzak, D. (eds) Convergence and Hybrid Information Technology. ICHIT 2012. Lecture Notes in Computer Science, vol 7425. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32645-5_61
Download citation
DOI: https://doi.org/10.1007/978-3-642-32645-5_61
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32644-8
Online ISBN: 978-3-642-32645-5
eBook Packages: Computer ScienceComputer Science (R0)