Abstract
Pronominalization is an important component in generating a coherent text. In this paper, we identify features that influence pronominalization, and construct a pronoun generation model by using various machine learning techniques. The old entities, which are the target of pronominalization, are categorized into three types according to their tendency in attentional state: Cb and old-Cp derived from a Centering model, and the remaining old entities. We construct a pronoun generation model for each type. Eighty-seven texts are gathered from three genres for training and testing. Using this, we verify that our proposed features are well defined to explain pronominalization in Korean, and we also show that our model significantly outperforms previous ones with 99% confidence level by t-test. We also identify central features that have a strong influence on pronominalization across genres.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Grosz, B.J., Joshi, A.K., Weinstein, S.: Centering: A Framework for Modeling the Local Coherence of Discourse. Computational Linguistics 21(2), 203–225 (1995)
Hall, M.A.: Correlation-based Feature Subset Selection for Machine Learning. Thesis submitted in partial fulfillment of the requirements of the degree of Doctor of Philosophy at the University of Waikato (1998)
Sachie, H.: Anaphoric Expression Selection in the Generation of Japanese. Information Processing Society of Japan (143) (2001)
Kibble, R., Power, R.: Using centering theory to plan coherent texts. In: Proceedings of the 12th Amsterdam Colloquium (1999)
Kibble, R., Power, R.: An integrated framework for text planning and pronomi-nalization. In: Proceedings of the 1st International Conference on Natural Language Generation (INLG 2000), Mitzpe Ramon, Israel, pp. 77–84 (2000)
Kim, M.K.: Conditions on Deletion in Korean based on Information Packaging. Discourse and Cognition 1(2), 61–88 (1999)
Kim, M.K.: Zero vs. Overt NPs in Korean Discourse: A Centering Analysis. Korean Journal of Linguistics 28-1, 29–49 (2003)
Poesio, M., Henschel, R., Hitzeman, J., Kibble, R.: Statistical NP generation: A first report. In: Kibble, R., van Deemter, K. (eds.) Proceedings of the Workshop on The Generation of Nominal Expressions, 11th European Summer School on Logic, Language, and Information, Utrecht, August 9-13 (1999)
Roh, J.E., Lee, J.H.: Coherent Text Generation using Entity-based Coherence Measures. In: ICCPOL, Shen-Yang, China, pp. 243–249 (2003)
Ryu, B.R.: Centering and Zero Anaphora in the Korean Discourse, Seoul National University, Ms Thesis (2001)
Strube, M., Hahn, U.: Functional Centering: Grounding Referential Coherence in Information Structure. Computational Linguistics 25(3), 309–344 (1999)
Strube, M., Wolters, M.: A Probabilistic Genre-Independent Model of Pronominali-zation. In: Proceedings of the first Meeting of the North American Chapter of the Association for Computational Linguistics, Seattle, WA, USA, April 29-May 4, pp. 18–25 (2000)
Yamura-Takei, M., Fujiwara, M., Aizawa, T.: Centering as an Anaphora Gen-eration Algorithm: A Language Learning Aid Perspective. In: NLPRS 2001, Tokyo, Japan, pp. 557–562 (2001)
Yeh, C.-L., Mellish, C.: An Empirical Study on the Generation of Anaph-ora in Chinese. Computational Linguistics 23-1, 169–190 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Roh, JE., Lee, JH. (2005). Building a Pronominalization Model by Feature Selection and Machine Learning. In: Su, KY., Tsujii, J., Lee, JH., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2004. IJCNLP 2004. Lecture Notes in Computer Science(), vol 3248. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30211-7_60
Download citation
DOI: https://doi.org/10.1007/978-3-540-30211-7_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24475-2
Online ISBN: 978-3-540-30211-7
eBook Packages: Computer ScienceComputer Science (R0)