Abstract
The majority of current coreference resolution systems rely on annotated corpora to train classifiers for this task. However, this is possible only for languages for which annotated corpora are available. This paper presents a system that automatically extracts coreference chains from texts in Portuguese without the need for Portuguese corpora manually annotated with coreferential information. To achieve this, an English coreference resolver is run on the English part of an English-Portuguese parallel corpus. The coreference pairs identified by the resolver are projected to the Portuguese part of the corpus using automatic word alignment. These projected pairs are then used to train the coreference resolver for Portuguese. Evaluation of the system reveals that it does not outperform a head match baseline. This is due to the fact that most of the projected pairs have the same head, which is learnt by the Portuguese classifier. This suggests that a more accurate English coreference resolver is necessary. A better projection algorithm is also likely to improve the performance of the system.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Aone, C., Bennett, S.W.: Evaluating automated and manual acquisition of anaphora resolution strategies. In: The 33rd Annual Meeting on Association for Computational Linguistics, pp. 122–129 (1995)
Bentivogli, L., Pianta, E.: Exploiting parallel texts in the creation of multilingual semantically annotated resources: the MultiSemCor Corpus. Natural Language Engineering 11(03), 247 (2005), http://www.journals.cambridge.org/abstract_S1351324905003839
Bick, E.: The parsing system PALAVRAS: automatic grammatical analysis of Portuguese in a constraint grammar framework. Phd, Arhus (2000)
Caseli, H.D.M.: Alinhamento sentencial de textos paralelos português-inglês. Master thesis, USP (2002), http://www2.dc.ufscar.br/~helenacaseli/pdf/2002/QualiMestrado.pdf
Chaves, A., Rino, L.: The Mitkov Algorithm for Anaphora Resolution in Portuguese. In: The 8th International Conference on Computational Processing of the Portuguese Language, p. 60 (2008)
Cohen, W.: Fast effective rule induction. In: 12th International Workshop Conference on Machine Learning, pp. 115–123. Morgan Kaufmann Publishers, Inc. (1995)
Collovini, S., Carbonel, T.I., Fuchs, J.T., Vieira, R.: Summ-it: Um corpus anotado com informacoes discursivas visando à sumarizacao automática. In: TIL - V Workshop em Tecnologia da Informação e da Linguagem Humana, Rio de Janeiro, pp. 1605–1614 (2007)
Collovini, S., Vieira, R.: Learning Discourse-new References in Portuguese Texts. In: TIL 2006, pp. 267–276 (2006)
Cuevas, R., Paraboni, I.: A Machine Learning Approach to Portuguese Pronoun Resolution. In: The 11th Ibero-American Conference on AI: Advances in Artificial Intelligence, pp. 262–271 (2008)
de Souza, J., Orăsan, C.: Coreference resolution for Portuguese using parallel corpora word alignment. In: The International Conference on Knowledge Engineering, Principles and Techniques (KEPT 2011), Cluj-Napoca, Romania (July 2011)
Hoste, V., Pauw, G.D.: KNACK-2002: a Richly Annotated Corpus of Dutch Written Text. In: The Fifth International Conference on Language Resources and Evaluation, pp. 1432–1437. ELRA (2006)
Konstantinova, N., Orăsan, C.: Issues in topic tracking in wikipedia articles. In: The International Conference on Knowledge Engineering, Principles and Techniques (KEPT 2011), Cluj-Napoca, Romania, July 4-6 (2011)
Luo, X.: On coreference resolution performance metrics. In: The Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 25–32 (2005)
McCarthy, J.F., Lehnert, W.G.: Using Decision Trees for Coreference Resolution. In: International Joint Conference on Artificial Intelligence, pp. 1050–1055 (1995)
Mitkov, R., Barbu, C.: Using bilingual corpora to improve pronoun resolution. Languages in contrast 4(2), 201–212 (2004)
Ng, V.: Graph-Cut-Based Anaphoricity Determination for Coreference Resolution. In: NAACL 2009: Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 575–583. Association for Computational Linguistics, Boulder (2009)
Ng, V.: Supervised Noun Phrase Coreference Research: The First Fifteen Years. In: ACL 2010, pp. 1396–1411 (July 2010)
Och, F.J., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 19–51 (2003)
Padó, S., Lapata, M.: Cross-lingual annotation projection of semantic roles. J. Artificial Intelligence Research. 36, 307–340 (2009)
Paraboni, I., Lima, V.L.S.D.: Possessive Pronominal Anaphor Resolution in Portuguese Written Texts - Project Notes. In: 17th International Conference on Computational Linguistics (COLING 1998), pp. 1010–1014. Morgan Kaufmann Publishers, Montreal (1998)
Postolache, O., Cristea, D., Orăsan, C.: Transferring Coreference Chains through Word Alignment. In: The 5th International Conference on Language Resources and Evaluation, Genoa, Italy (2006)
Recasens, M., Hovy, E.: A deeper look into features for coreference resolution. Anaphora Processing and Applications (i), 29–42 (2009)
Recasens, M., Martí, M.A.: AnCora-CO: Coreferentially annotated corpora for Spanish and Catalan. Language Resources and Evaluation 44(4), 341–345 (2009)
Soon, W.M., Ng, H.T., Lim, D.C.Y.: A Machine Learning Approach to Coreference Resolution of Noun Phrases. Computational Linguistics 27(4), 521–544 (2001)
de Souza, J.G.C., Gonçalves, P.N., Vieira, R.: Learning Coreference Resolution for Portuguese Texts. In: Teixeira, A., de Lima, V.L.S., de Oliveira, L.C., Quaresma, P. (eds.) PROPOR 2008. LNCS (LNAI), vol. 5190, pp. 153–162. Springer, Heidelberg (2008)
Stoyanov, V., Cardie, C., Gilbert, N., Buttler, D.: Coreference Resolution with Reconcile. In: The Joint Conference of the 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010). Association for Computational Linguistics (2010)
Vilain, M., Burger, J., Aberdeen, J., Connolly, D.: A model-theoretic coreference scoring scheme. In: The 6th Conference on Message Understanding, pp. 45–52 (1995)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. The Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann (2005)
Yarowsky, D., Ngai, G.: Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora. In: Second Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies 2001, pp. 1–8. Association for Computational Linguistics, Pittsburgh (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
de Souza, J.G.C., Orăsan, C. (2011). Can Projected Chains in Parallel Corpora Help Coreference Resolution?. In: Hendrickx, I., Lalitha Devi, S., Branco, A., Mitkov, R. (eds) Anaphora Processing and Applications. DAARC 2011. Lecture Notes in Computer Science(), vol 7099. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25917-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-25917-3_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25916-6
Online ISBN: 978-3-642-25917-3
eBook Packages: Computer ScienceComputer Science (R0)