Named Entity Extraction via Automatic Labeling and Tri-training: Comparison of Selection Methods

Chou, Chien-Lung; Chang, Chia-Hui

doi:10.1007/978-3-319-12844-3_21

Chien-Lung Chou²² &
Chia-Hui Chang²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8870))

Included in the following conference series:

Asia Information Retrieval Symposium

1473 Accesses

Abstract

Detecting named entities from documents is one of the most important tasks in knowledge engineering. Previous studies rely on annotated training data, which is quite expensive to obtain large training data sets, limiting the effectiveness of recognition. In this research, we propose a semi-supervised learning approach for named entity recognition (NER) via automatic labeling and tritraining which make use of unlabeled data and structured resources containing known named entities. By modifying tri-training for sequence labeling and deriving proper initialization, we can train a NER model for Web news articles automatically with satisfactory performance. In the task of Chinese personal name extraction from 8,672 news articles on the Web (with 364,685 sentences and 54,449 (11,856 distinct) person names), an F-measure of 90.4% can be achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Named entity recognition: a semi-supervised learning approach

Article 24 May 2020

Named Entity Recognition Through Learning from Experts

Weakly-Supervised Named Entity Extraction Using Word Representations

References

Ando, R.K., Zhang, T.: A High-performance Semi-supervised Learning Method for Text Chunking. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL 2005), pp. 1–9 (2005)
Google Scholar
Blum, A., Mitchell, T.: Combining Labeled and Unlabeled Data with Co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 92–100 (1998)
Google Scholar
Chen, W., Zhang, Y., Isahara, H.: Chinese Chunking with Tri-training Learning. In: Matsumoto, Y., Sproat, R.W., Wong, K.-F., Zhang, M. (eds.) ICCPOL 2006. LNCS (LNAI), vol. 4285, pp. 466–473. Springer, Heidelberg (2006)
Chapter Google Scholar
Chou, C.-L., Chang, C.-H., Wu, S.-Y.: Semi-supervised Sequence Labeling for Named Entity Extraction based on Tri-Training: Case Study on Chinese Person Name Extraction, Semantic Web and Information Extraction Workshop (SWAIE), In conjunction with COLING 2014, August 24, Dublin, Irland (2014)
Google Scholar
Goldman, S.A., Zhou, Y.: Enhancing Supervised Learning with Unlabeled Data. In: Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), pp. 327–334 (2000)
Google Scholar
Grandvalet, Y., Bengio, Y.: Semi-supervised Learning by Entropy Minimization. In: CAP, pp. 281–296. PUG (2004)
Google Scholar
Jiao, F., Wang, S., Lee, C.-H., Greiner, R., Schuurmans, D.: Semi-supervised Conditional Random Fields for Improved Sequence Segmentation and Labeling. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (ACL-44), pp. 209–216 (2006)
Google Scholar
CRF++: Yet Another CRF toolkit, http://crfpp.googlecode.com/svn/trunk/doc/index.html
Li, W., McCallum, A.: Semi-supervised Sequence Modeling with Syntactic Topic Models. In: Proceedings of the 20th National Conference on Artificial Intelligence (AAAI 2005), vol. 2, pp. 813–818 (2005)
Google Scholar
Mann, G.S., McCallum, A.: Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data. J. Mach. Learn. Res. 11, 955–984 (2010)
MATH MathSciNet Google Scholar
McCallum, A., Li, W.: Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-enhanced Lexicons. In: Proceedings of the Seventh Conference on Natural Language Learning HLT-NAACL 2003 (CONLL 2003), vol. 4, pp. 188–191 (2003)
Google Scholar
Nigam, K., Ghani, R.: Analyzing the Effectiveness and Applicability of Co-training. In: Proceedings of the Ninth International Conference on Information and Knowledge Management (CIKM 2000), pp. 86–93 (2000)
Google Scholar
Zheng, L., Wang, S., Liu, Y., Lee, C.-H.: Information Theoretic Regularization for Semi-supervised Boosting. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2009), pp. 1017–1026 (2009)
Google Scholar
Zhou, D., Huang, J.: Schö, l., Bernhard: Learning from Labeled and Unlabeled Data on a Directed Graph. In: Proceedings of the 22nd International Conference on Machine Learning (ICML 2005), pp. 1036–1043. ACM (2005)
Google Scholar
Zhou, Z.-H., Li, M.: Tri-Training: Exploiting Unlabeled Data Using Three Classifiers. IEEE Trans. on Knowl. and Data Eng. 17, 1529–1541 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

National Central University, Taoyuan, Taiwan
Chien-Lung Chou & Chia-Hui Chang

Authors

Chien-Lung Chou
View author publications
You can also search for this author in PubMed Google Scholar
Chia-Hui Chang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Visual Informatic, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
Azizah Jaafar
Institute of Visual Informatics, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
Nazlena Mohamad Ali
Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
Shahrul Azman Mohd Noah
Insight Centre for Data Analytics, Dublin City University, Glasnevin, 9, Dublin, Ireland
Alan F. Smeaton
Information Systems, Queensland University of Technology, 4001, Brisbane, QLD, Australia
Peter Bruza
Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, 40450, Shah Alam, Selangor, Malaysia
Zainab Abu Bakar & Nursuriati Jamil &
Cyber Security Center, Universiti Pertahanan Nasional Malaysia, Kem Sungai Besi, 57000, Kuala Lumpur, Malaysia
Tengku Mohd Tengku Sembok

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chou, CL., Chang, CH. (2014). Named Entity Extraction via Automatic Labeling and Tri-training: Comparison of Selection Methods. In: Jaafar, A., et al. Information Retrieval Technology. AIRS 2014. Lecture Notes in Computer Science, vol 8870. Springer, Cham. https://doi.org/10.1007/978-3-319-12844-3_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-12844-3_21
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12843-6
Online ISBN: 978-3-319-12844-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Named Entity Extraction via Automatic Labeling and Tri-training: Comparison of Selection Methods

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Named entity recognition: a semi-supervised learning approach

Named Entity Recognition Through Learning from Experts

Weakly-Supervised Named Entity Extraction Using Word Representations

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Named Entity Extraction via Automatic Labeling and Tri-training: Comparison of Selection Methods

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Named entity recognition: a semi-supervised learning approach

Named Entity Recognition Through Learning from Experts

Weakly-Supervised Named Entity Extraction Using Word Representations

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation