A Named Entity Extraction using Word Information Repeatedly Collected from Unlabeled Data

Iwakura, Tomoya

doi:10.1007/978-3-642-12116-6_18

Tomoya Iwakura¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6008))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1820 Accesses

Abstract

This paper proposes a method for Named Entity (NE) extraction using NE-related labels of words repeatedly collected from unlabeled data. NE-related labels of words are candidate NE classes of each word, NE classes of co-occurring words of each word, and so on. To collect NE-related labels of words, we extract NEs from unlabeled data with an NE extractor. Then we collect NE-related labels of words from the extraction results. We create a new NE extractor using the NE-related labels of each word as new features. The new NE extractor is used to collect new NE-related labels of words. The experimental results using IREX data set for Japanese NE extraction show that our method contributes improved accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Freitag, D.: Trained named entity recognition using distributional clusters. In: Proc. of EMNLP 2004, pp. 262–269 (2004)
Google Scholar
Miller, S., Guinness, J., Zamanian, A.: Name tagging with word clusters and discriminative training. In: HLT-NAACL, pp. 337–342 (2004)
Google Scholar
Ando, R., Zhang, T.: A high-performance semi-supervised learning method for text chunking. In: Proc. of ACL 2004, pp. 1–9 (2005)
Google Scholar
Kazama, J., Torisawa, K.: Inducing gazetteers for named entity recognition by large-scale clustering of dependency relations. In: Proc. of ACL 2008: HLT, pp. 407–415 (2008)
Google Scholar
Suzuki, J., Isozaki, H.: Semi-supervised sequential labeling and segmentation using giga-word scale unlabeled data. In: Proc. of ACL 2008: HLT, pp. 665–673 (2008)
Google Scholar
Lin, D., Wu, X.: Phrase clustering for discriminative learning. In: Proceedings of ACL-IJCNLP 2009, pp. 1030–1038 (2009)
Google Scholar
Iwakura, T., Okamoto, S.: Japanese named entity extraction by augmenting features with unlabeled data. IPSJ Journal 49(10), 3657–3669 (2008) (in Japanese)
Google Scholar
Iwakura, T., Okamoto, S.: A fast boosting-based learner for feature-rich tagging and chunking. In: Proc. of CoNLL 2008, pp. 17–24 (2008)
Google Scholar
IREX, C.: Proc. of the IREX workshop (1999)
Google Scholar
Ramshaw, L., Marcus, M.: Text chunking using transformation-based learning. In: Proc. of the Third Workshop on Very Large Corpora, Association for Computational Linguistics, pp. 82–94 (1995)
Google Scholar
Tjong Kim Sang, E., Veenstra, J.: Representing text chunks. In: Proc. of EACL 1999, pp. 173–179 (1999)
Google Scholar
Uchimoto, K., Ma, Q., Murata, M., Ozaku, H., Utiyama, M., Isahara, H.: Named entity extraction based on a maximum entropy model and transformation rules. In: Proc. of the ACL 2000, pp. 326–335 (2000)
Google Scholar
Sasano, R., Kurohashi, S.: Japanese named entity recognition using structural natural language processing. In: Proc. of IJCNLP 2008, pp. 607–612 (2008)
Google Scholar
Asahara, M., Matsumoto, Y.: Japanese named entity extraction with redundant morphological analysis. In: Proc. of HLT-NAACL 2003, pp. 8–15 (2003)
Google Scholar
Nakano, K., Hirai, Y.: Japanese named entity extraction with bunsetsu features. IPSJ Journal 45(3), 934–941 (2004) (in Japanese)
Google Scholar
Iwakura, T.: Fast boosting-based part-of-speech tagging and text chunking with efficient rule representation for sequential labeling. In: Proc. of RANLP 2009 (2009)
Google Scholar
Takemoto, Y., Fukushima, T., Yamada, H.: A Japanese named entity extraction system based on building a large-scale and high quality dictionary and pattern-matching rules 42(6), 1580–1591 (2001) (in Japanese)
Google Scholar
Utsuro, T., Sassano, M., Uchimoto, K.: Combining outputs of multiple Japanese named entity chunkers by stacking. In: Proc. of EMNLP 2002, pp. 281–288 (2002)
Google Scholar
Yamada, H., Kudoh, T., Matsumoto, Y.: Japanese named entity extraction using Support Vector Machine. IPSJ Journal 43(1), 44–53 (2002) (in Japanese)
Google Scholar
Isozaki, H., Kazawa, H.: Speeding up named entity recognition based on Support Vector Machines. IPSJ SIG notes NL-149-1, 1–8 (2002) (in Japanese)
Google Scholar

Download references

Author information

Authors and Affiliations

Fujitsu Laboratories Ltd., 1-1, Kamikodanaka 4-chome, Nakahara-ku, Kawasaki, 211-8588, Japan
Tomoya Iwakura

Authors

Tomoya Iwakura
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, 07738, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Iwakura, T. (2010). A Named Entity Extraction using Word Information Repeatedly Collected from Unlabeled Data. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2010. Lecture Notes in Computer Science, vol 6008. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12116-6_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-12116-6_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12115-9
Online ISBN: 978-3-642-12116-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics