Skip to main content

Multi-class Named Entity Recognition Via Bootstrapping with Dependency Tree-Based Patterns

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5012))

Included in the following conference series:

Abstract

Named Entity Recognition (NER) has become a well-known problem with many important applications, such as Question Answering, Relation Extraction and Concept Retrieval. NER based on unsupervised learning via bootstrapping is gaining researchers’ interest these days because it does not require manually annotating training data. Meanwhile, dependency tree-based patterns have proved to be effective in Relation Extraction. In this paper, we demonstrate that the use of dependency trees as extraction patterns, together with a bootstrapping framework, can improve the performance of the NER system and suggest a method for efficiently computing these tree patterns. Since unsupervised NER via bootstrapping uses the entities learned from each iteration as seeds for the next iterations, the quality of these seeds greatly affects the entire learning process. We introduce the technique of simultaneous bootstrapping of multiple classes, which can dramatically improve the quality of the seeds obtained at each iteration and hence increase the quality of the final learning results. Our experiments show beneficial results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Bikel, D.M., Miller, S., Schwartz, R., Weischedel, R.M.: Nymble: A high-performance learning name-finder. In: Proceedings of the 5th Conference on Applied Natural Language Processing, pp. 194–201 (1997)

    Google Scholar 

  2. Borthwick, A., Sterling, J., Agichtein, E., Grishman, R.: Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recognition. In: Proceedings of the 6th Workshop on Very Large Corpora, pp. 152–160 (1998)

    Google Scholar 

  3. Collins, M.: Ranking Algorithms for Named-Entity Extraction: Boosting and the Voted Perceptron. In: Proceedings of the Annual Meeting of the Association for Computation Linguistics, pp. 489–496 (2002)

    Google Scholar 

  4. McCallum, A., Li, W.: Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons. In: The 7th Conference on Natural Language Learning (CoNLL), pp. 188–191 (2003)

    Google Scholar 

  5. Yangarber, R., Grishman, R., Tapanainen, P., Huttunen, S.: Unsupervised Discovery of Scenario-Level Patterns for Information Extraction. In: Proceedings of Conference on Applied Natural Language Processing, pp. 282–289 (2000)

    Google Scholar 

  6. Yangarber, R., Lin, W., Grishman, R.: Unsupervised learning of generalized names. In: Proceedings of the 19th International Conference on Computational Linguistics, pp. 1–7 (2002)

    Google Scholar 

  7. Collins, M., Singer, Y.: Unsupervised Models for Named Entity Classification. In: Proceedings of Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (1999)

    Google Scholar 

  8. Brin, S.: Extracting Patterns and Relations from the World Wide Web. In: Proceedings of the International Workshop on the Web and Databases, pp. 172–183 (1998)

    Google Scholar 

  9. Agichtein, E., Gravano, L.: Snowball: Extracting Relations from Large Plain-Text Collections. In: The 5th ACM International Conference on Digital Libraries, pp. 85–94 (2000)

    Google Scholar 

  10. Sudo, K., Sekine, S., Grishman, R.: An Improved Extraction Pattern Representation Model for Automatic IE Pattern Acquisition. In: Proceedings of the 41st Annual Meeting of Association of Computational Linguistics, pp. 224–231 (2003)

    Google Scholar 

  11. Riloff, E.: Automatically Generating Extraction Patterns from Untagged Text. In: Proceedings of the 13th National Conference on Artificial Intelligence, pp. 1044–1049 (1996)

    Google Scholar 

  12. Stevenson, M., Greenwood, M.A.: Comparing Information Extraction Pattern Models. In: Proceedings of the Workshop on Information Extraction Beyond The Document, pp. 12–19 (2006)

    Google Scholar 

  13. Sekine, S.: On-Demand Information Extraction. In: Proceedings of the International Conference on Computational Linguistics and the Association for Computational Linguistics, pp. 17–21 (2006)

    Google Scholar 

  14. Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient Substructure Discovery from Large Semi-structured Data. IEICE Transactions on Information and Systems E87-D(12), 2754–2763 (2004)

    Google Scholar 

  15. Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A., Shaked, T., Soderland, S., Weld, D., Yates, A.: Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence 165(1), 91–134 (2005)

    Article  Google Scholar 

  16. http://nlp.stanford.edu/software/lex-parser.shtml

  17. http://developer.yahoo.com/download/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Takashi Washio Einoshin Suzuki Kai Ming Ting Akihiro Inokuchi

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dang, V.B., Aizawa, A. (2008). Multi-class Named Entity Recognition Via Bootstrapping with Dependency Tree-Based Patterns. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2008. Lecture Notes in Computer Science(), vol 5012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68125-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-68125-0_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68124-3

  • Online ISBN: 978-3-540-68125-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics