Skip to main content

A Named Entity Extraction using Word Information Repeatedly Collected from Unlabeled Data

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6008))

  • 1820 Accesses

Abstract

This paper proposes a method for Named Entity (NE) extraction using NE-related labels of words repeatedly collected from unlabeled data. NE-related labels of words are candidate NE classes of each word, NE classes of co-occurring words of each word, and so on. To collect NE-related labels of words, we extract NEs from unlabeled data with an NE extractor. Then we collect NE-related labels of words from the extraction results. We create a new NE extractor using the NE-related labels of each word as new features. The new NE extractor is used to collect new NE-related labels of words. The experimental results using IREX data set for Japanese NE extraction show that our method contributes improved accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Freitag, D.: Trained named entity recognition using distributional clusters. In: Proc. of EMNLP 2004, pp. 262–269 (2004)

    Google Scholar 

  2. Miller, S., Guinness, J., Zamanian, A.: Name tagging with word clusters and discriminative training. In: HLT-NAACL, pp. 337–342 (2004)

    Google Scholar 

  3. Ando, R., Zhang, T.: A high-performance semi-supervised learning method for text chunking. In: Proc. of ACL 2004, pp. 1–9 (2005)

    Google Scholar 

  4. Kazama, J., Torisawa, K.: Inducing gazetteers for named entity recognition by large-scale clustering of dependency relations. In: Proc. of ACL 2008: HLT, pp. 407–415 (2008)

    Google Scholar 

  5. Suzuki, J., Isozaki, H.: Semi-supervised sequential labeling and segmentation using giga-word scale unlabeled data. In: Proc. of ACL 2008: HLT, pp. 665–673 (2008)

    Google Scholar 

  6. Lin, D., Wu, X.: Phrase clustering for discriminative learning. In: Proceedings of ACL-IJCNLP 2009, pp. 1030–1038 (2009)

    Google Scholar 

  7. Iwakura, T., Okamoto, S.: Japanese named entity extraction by augmenting features with unlabeled data. IPSJ Journal 49(10), 3657–3669 (2008) (in Japanese)

    Google Scholar 

  8. Iwakura, T., Okamoto, S.: A fast boosting-based learner for feature-rich tagging and chunking. In: Proc. of CoNLL 2008, pp. 17–24 (2008)

    Google Scholar 

  9. IREX, C.: Proc. of the IREX workshop (1999)

    Google Scholar 

  10. Ramshaw, L., Marcus, M.: Text chunking using transformation-based learning. In: Proc. of the Third Workshop on Very Large Corpora, Association for Computational Linguistics, pp. 82–94 (1995)

    Google Scholar 

  11. Tjong Kim Sang, E., Veenstra, J.: Representing text chunks. In: Proc. of EACL 1999, pp. 173–179 (1999)

    Google Scholar 

  12. Uchimoto, K., Ma, Q., Murata, M., Ozaku, H., Utiyama, M., Isahara, H.: Named entity extraction based on a maximum entropy model and transformation rules. In: Proc. of the ACL 2000, pp. 326–335 (2000)

    Google Scholar 

  13. Sasano, R., Kurohashi, S.: Japanese named entity recognition using structural natural language processing. In: Proc. of IJCNLP 2008, pp. 607–612 (2008)

    Google Scholar 

  14. Asahara, M., Matsumoto, Y.: Japanese named entity extraction with redundant morphological analysis. In: Proc. of HLT-NAACL 2003, pp. 8–15 (2003)

    Google Scholar 

  15. Nakano, K., Hirai, Y.: Japanese named entity extraction with bunsetsu features. IPSJ Journal 45(3), 934–941 (2004) (in Japanese)

    Google Scholar 

  16. Iwakura, T.: Fast boosting-based part-of-speech tagging and text chunking with efficient rule representation for sequential labeling. In: Proc. of RANLP 2009 (2009)

    Google Scholar 

  17. Takemoto, Y., Fukushima, T., Yamada, H.: A Japanese named entity extraction system based on building a large-scale and high quality dictionary and pattern-matching rules 42(6), 1580–1591 (2001) (in Japanese)

    Google Scholar 

  18. Utsuro, T., Sassano, M., Uchimoto, K.: Combining outputs of multiple Japanese named entity chunkers by stacking. In: Proc. of EMNLP 2002, pp. 281–288 (2002)

    Google Scholar 

  19. Yamada, H., Kudoh, T., Matsumoto, Y.: Japanese named entity extraction using Support Vector Machine. IPSJ Journal 43(1), 44–53 (2002) (in Japanese)

    Google Scholar 

  20. Isozaki, H., Kazawa, H.: Speeding up named entity recognition based on Support Vector Machines. IPSJ SIG notes NL-149-1, 1–8 (2002) (in Japanese)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Iwakura, T. (2010). A Named Entity Extraction using Word Information Repeatedly Collected from Unlabeled Data. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2010. Lecture Notes in Computer Science, vol 6008. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12116-6_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12116-6_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12115-9

  • Online ISBN: 978-3-642-12116-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics