Abstract
Pronoun resolution is one of the challenges of natural language processing (NLP). The proposed solutions range from heuristic rule-based to machine learning data driven approaches. In this article, we follow a previous machine learning approach on Persian pronoun anaphora resolution. The primary goal of this paper is to improve the results, mainly by extracting more balanced data through using heuristic rules in instance sampling, and utilizing more relevant features in classification. Using PCAC2008 dataset, we consider noun phrase structure as a way to extract more suitable training data. Incorporated features include syntactic and semantic information. Finally, we train and test different classifiers in order to find and compare the results. The best result is achieved by using the C4.5 decision tree classifier. The results show a significant improvement over the previous work by achieving 75% F-measure compared to 45%. An analysis of extracted features and their contribution are also discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Prentice-Hall, New Jersey (2000)
Hobbs, J.R.: Resolving pronoun references. Lingua 44(4), 311–338 (1978)
Ng, V.: Supervised noun phrase coreference research: The first fifteen years. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1396–1411 (2010)
Charniak, E., Elsner, M.: EM works for pronoun anaphora resolution. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 148–156 (2009)
Moosavi, N.S., Ghassem-Sani, G.: A ranking approach to Persian pronoun resolution. In: Advances in Computational Linguistics. Research in Computing Science 41, 169–180 (2009)
Aone, C., Bennett, S.W.: Applying machine learning to anaphora resolution. In: Wermter, S., Riloff, E., Scheler, G. (eds.) IJCAI 1995. LNCS, vol. 1040, pp. 302–314. Springer, Heidelberg (1996). doi:10.1007/3-540-60925-3_55
Soon, W.M., Ng, H.T., Lim, D.C.Y.: A machine learning approach to coreference resolution of noun phrases. Comput. Linguis. 27(4), 521–544 (2001)
Ng, V., Cardie, C.: Improving machine learning approaches to coreference resolution. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 104–111 (2002)
Cuevas, R.R.M., Paraboni, I.: A machine learning approach to portuguese pronoun resolution. In: Geffner, H., Prada, R., Machado Alexandre, I., David, N. (eds.) IBERAMIA 2008. LNCS, vol. 5290, pp. 262–271. Springer, Heidelberg (2008). doi:10.1007/978-3-540-88309-8_27
Arregi, O., Ceberio, K., Díaz de Illarraza, A., Goenaga, I., Sierra, B., Zelaia, A.: A first machine learning approach to pronominal anaphora resolution in basque. In: Kuri-Morales, A., Simari, Guillermo R. (eds.) IBERAMIA 2010. LNCS, vol. 6433, pp. 234–243. Springer, Heidelberg (2010). doi:10.1007/978-3-642-16952-6_24
Denis, P., Baldridge, J.: A ranking approach to pronoun resolution. In: IJCAI, pp. 1588–1593 (2007)
Yang, X., Su, J., Tan, C.L.: A twin-candidate model for learning-based anaphora resolution. Comput. Linguist. 34(3), 327–356 (2008)
Anvari, H., Givi, H.A.: Persian Grammar Book. Fatemi Cultural Institute, Tehran, Iran (2006). (in Persian)
Fallahi, F., Shamsfard, M.: Recognizing anaphora reference in Persian sentences. Int. J. Comput. Sci. Issues 8, 324–329 (2011)
Nazaridoust, M., Bidgoli, B.M., Nazaridoust, S.: Co-reference resolution in Farsi corpora. In: Jamshidi, M., Kreinovich, V., Kacprzyk, J. (eds.) Advance Trends in Soft Computing, pp. 155–162. Springer, Cham (2014)
Bijankhan, M., Seikhzadeghan, J., Bahrani, M., Ghayoomi, M.: Lessons from creation of a Persian written corpus: Peykare. Lang. Resour. Eval. J. 45(2), 143–164 (2011)
Samaei, S.M.: Noun phrase processing. Inf. Sci. 18, 34–41 (2002). (In Persian)
www.sobhe.ir/hazm/. Retrieved December 2015
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Nøklestad, A.: A Machine Learning Approach to Anaphora Resolution Including Named Entity Recognition, PP Attachment Disambiguation, and Animacy Detection, Ph.D. thesis, Faculty of Humanities, University of Oslo (2009)
Lappin, S., Leass, H.J.: An algorithm for pronominal anaphora resolution. Comput. Linguist. 20(4), 535–561 (1994)
Wunsch, H., Kübler, S., Cantrell, R.: Instance sampling methods for pronoun resolution. In: RANLP 2009, pp. 478–483 (2009)
Moosavi, N.S.: Using Machine Learning Approaches for Persian Pronoun Resolution, Ms.C. thesis, Faculty of Computer Engineering. Sharif University of Technology, Tehran, Iran (2009). (In Persian)
Bahrani, M., Sameti, H.: Building statistical language models for Persian continuous speech recognition systems using the Peykare corpus. Int. J. Comput. Process. Lang. 23(1), 1–20 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Nourbakhsh, A., Bahrani, M. (2017). Persian Pronoun Resolution Using Data Driven Approaches. In: Damaševičius, R., Mikašytė, V. (eds) Information and Software Technologies. ICIST 2017. Communications in Computer and Information Science, vol 756. Springer, Cham. https://doi.org/10.1007/978-3-319-67642-5_48
Download citation
DOI: https://doi.org/10.1007/978-3-319-67642-5_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67641-8
Online ISBN: 978-3-319-67642-5
eBook Packages: Computer ScienceComputer Science (R0)