Abstract
Named Entity Recognition (NER) has become a well-known problem with many important applications, such as Question Answering, Relation Extraction and Concept Retrieval. NER based on unsupervised learning via bootstrapping is gaining researchers’ interest these days because it does not require manually annotating training data. Meanwhile, dependency tree-based patterns have proved to be effective in Relation Extraction. In this paper, we demonstrate that the use of dependency trees as extraction patterns, together with a bootstrapping framework, can improve the performance of the NER system and suggest a method for efficiently computing these tree patterns. Since unsupervised NER via bootstrapping uses the entities learned from each iteration as seeds for the next iterations, the quality of these seeds greatly affects the entire learning process. We introduce the technique of simultaneous bootstrapping of multiple classes, which can dramatically improve the quality of the seeds obtained at each iteration and hence increase the quality of the final learning results. Our experiments show beneficial results.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bikel, D.M., Miller, S., Schwartz, R., Weischedel, R.M.: Nymble: A high-performance learning name-finder. In: Proceedings of the 5th Conference on Applied Natural Language Processing, pp. 194–201 (1997)
Borthwick, A., Sterling, J., Agichtein, E., Grishman, R.: Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recognition. In: Proceedings of the 6th Workshop on Very Large Corpora, pp. 152–160 (1998)
Collins, M.: Ranking Algorithms for Named-Entity Extraction: Boosting and the Voted Perceptron. In: Proceedings of the Annual Meeting of the Association for Computation Linguistics, pp. 489–496 (2002)
McCallum, A., Li, W.: Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons. In: The 7th Conference on Natural Language Learning (CoNLL), pp. 188–191 (2003)
Yangarber, R., Grishman, R., Tapanainen, P., Huttunen, S.: Unsupervised Discovery of Scenario-Level Patterns for Information Extraction. In: Proceedings of Conference on Applied Natural Language Processing, pp. 282–289 (2000)
Yangarber, R., Lin, W., Grishman, R.: Unsupervised learning of generalized names. In: Proceedings of the 19th International Conference on Computational Linguistics, pp. 1–7 (2002)
Collins, M., Singer, Y.: Unsupervised Models for Named Entity Classification. In: Proceedings of Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (1999)
Brin, S.: Extracting Patterns and Relations from the World Wide Web. In: Proceedings of the International Workshop on the Web and Databases, pp. 172–183 (1998)
Agichtein, E., Gravano, L.: Snowball: Extracting Relations from Large Plain-Text Collections. In: The 5th ACM International Conference on Digital Libraries, pp. 85–94 (2000)
Sudo, K., Sekine, S., Grishman, R.: An Improved Extraction Pattern Representation Model for Automatic IE Pattern Acquisition. In: Proceedings of the 41st Annual Meeting of Association of Computational Linguistics, pp. 224–231 (2003)
Riloff, E.: Automatically Generating Extraction Patterns from Untagged Text. In: Proceedings of the 13th National Conference on Artificial Intelligence, pp. 1044–1049 (1996)
Stevenson, M., Greenwood, M.A.: Comparing Information Extraction Pattern Models. In: Proceedings of the Workshop on Information Extraction Beyond The Document, pp. 12–19 (2006)
Sekine, S.: On-Demand Information Extraction. In: Proceedings of the International Conference on Computational Linguistics and the Association for Computational Linguistics, pp. 17–21 (2006)
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient Substructure Discovery from Large Semi-structured Data. IEICE Transactions on Information and Systems E87-D(12), 2754–2763 (2004)
Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A., Shaked, T., Soderland, S., Weld, D., Yates, A.: Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence 165(1), 91–134 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dang, V.B., Aizawa, A. (2008). Multi-class Named Entity Recognition Via Bootstrapping with Dependency Tree-Based Patterns. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2008. Lecture Notes in Computer Science(), vol 5012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68125-0_9
Download citation
DOI: https://doi.org/10.1007/978-3-540-68125-0_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68124-3
Online ISBN: 978-3-540-68125-0
eBook Packages: Computer ScienceComputer Science (R0)