Abstract
Bootstrapping is a weakly supervised algorithm that has been the focus of attention in many Information Extraction(IE) and Natural Language Processing(NLP) fields, especially in learning semantic lexicons. In this paper, we propose a new bootstrapping algorithm called Mutual Screening Graph Algorithm (MSGA) to learn semantic lexicons. The approach uses only unannotated corpus and a few of seed words to learn new words for each semantic category. By changing the format of extracted patterns and the method for scoring patterns and words, we improve the former bootstrapping algorithm. We also evaluate the semantic lexicons produced by MSGA with previous bootstrapping algorithm Basilisk [1] and GMR (Graph Mutual Reinforcement based Bootstrapping) [4]. Experiments have shown that MSGA can outperform those approaches.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Thelen, M., Riloff, E.: A bootstrapping method for learning semantic lexicons using extraction pattern contexts. In: Proceedings of the ACL 2002 conference on Empirical methods in natural language processing, Philadelphia, USA, vol. 10, pp. 214–221 (2002)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on Computational learning theory, Madison, Wisconsin, United States, pp. 92–100 (1998)
Phillips, W., Riloff, E.: Exploiting Role-Identifying Nouns and Expressions for Information Extraction. In: 2007 Proceedings of Recent Advances in Natural Language Processing, RANLP 2007 (2007)
Hassan, H., Hassan, A., Emam, O.: Unsupervised Information Extraction Approach Using Graph Mutual. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006), pp. 501–508 (2006)
Patwardhan, S., Riloff, E.: Learning Domain-Specific Information Extraction Patterns from the Web. In: Proceedings of the Workshop on Information Extraction Beyond The Document, pp. 66–73 (2006)
Florian, R., Hassan, H., Ittycheriah, A., Jing, H., Kambhatla, N., Luo, X., Nicolov, N., Roukos, S.: A statistical model for multilingual entity detection and tracking. In: HLT-NAACL 2004: Main Proceedings, pp. 1–8 (2004)
Kambhatla, N.: Combining lexical, syntactic, and semantic features with maximum entropy models for in-formation extraction. In: The Companion Volume to the Proceedings of 42st Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, pp. 178–181 (2004)
Collins, M., Singer, Y.: Unsupervised models for named entity classification. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. University of Maryland, MD (1999)
Etzioni, O., Cafarella, M., Downey, D., Popescu, A., Shaked, S.T.: Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence 165(1), 91–134 (2005)
Riloff, E., Wiebe, J., Wilson, T.: Learning subjective nouns using extraction pattern bootstrapping. In: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003, Edmonton, Canada, vol. 4, pp. 25–32 (2003)
Riloff, E.: Automatically generating extraction patterns from untagged text. pattern bootstrapping. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence, Portland, Oregon, pp. 1044–1049 (1996)
Riloff, E., Phillips, W.: An Introduction to the Sundance and AutoSlog Systems (2004)
COAE Proceedings: COAE proceedings. In: Proceedings of Chinese Opinion Analysis Evaluation 2008, COAE 2008 (2008)
Riloff, E., Jones, R.: Learning dictionaries for information extraction by multi-level bootstrapping. In: Proceedings of the 16th National Conference on Artificial Intelligence, Orlando, USA, pp. 474–479 (1999)
Hirschman, L., Light, M., Breck, E., Burger, J.D.: Deep read: A reading comprehension system. University of Maryland, United States (1999)
Moldovan, D., Harabagiu, S., Pasca, M., Mihalcea, R., Goodrum, R., Girju, R., Rus, V.: Lasso: A tool for surfing the answer net. In: Proceedings of the Eighth Text REtrieval Conference, TREC-8 (1999)
Riloff, E., Schmelzenbach, M.: An empirical approach to conceptual case frame acquisition. In: Proceedings of the Sixth Workshop on Very large Corpora, Montreal, Canada (August 1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, Y., Zhou, Y. (2009). Mutual Screening Graph Algorithm: A New Bootstrapping Algorithm for Lexical Acquisition. In: Lee, G.G., et al. Information Retrieval Technology. AIRS 2009. Lecture Notes in Computer Science, vol 5839. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04769-5_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-04769-5_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04768-8
Online ISBN: 978-3-642-04769-5
eBook Packages: Computer ScienceComputer Science (R0)