Can Projected Chains in Parallel Corpora Help Coreference Resolution?

de Souza, José Guilherme Camargo; Orăsan, Constantin

doi:10.1007/978-3-642-25917-3_6

Can Projected Chains in Parallel Corpora Help Coreference Resolution?

José Guilherme Camargo de Souza²³ &
Constantin Orăsan²³

Conference paper

693 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7099))

Abstract

The majority of current coreference resolution systems rely on annotated corpora to train classifiers for this task. However, this is possible only for languages for which annotated corpora are available. This paper presents a system that automatically extracts coreference chains from texts in Portuguese without the need for Portuguese corpora manually annotated with coreferential information. To achieve this, an English coreference resolver is run on the English part of an English-Portuguese parallel corpus. The coreference pairs identified by the resolver are projected to the Portuguese part of the corpus using automatic word alignment. These projected pairs are then used to train the coreference resolver for Portuguese. Evaluation of the system reveals that it does not outperform a head match baseline. This is due to the fact that most of the projected pairs have the same head, which is learnt by the Portuguese classifier. This suggests that a more accurate English coreference resolver is necessary. A better projection algorithm is also likely to improve the performance of the system.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aone, C., Bennett, S.W.: Evaluating automated and manual acquisition of anaphora resolution strategies. In: The 33rd Annual Meeting on Association for Computational Linguistics, pp. 122–129 (1995)
Google Scholar
Bentivogli, L., Pianta, E.: Exploiting parallel texts in the creation of multilingual semantically annotated resources: the MultiSemCor Corpus. Natural Language Engineering 11(03), 247 (2005), http://www.journals.cambridge.org/abstract_S1351324905003839
Article Google Scholar
Bick, E.: The parsing system PALAVRAS: automatic grammatical analysis of Portuguese in a constraint grammar framework. Phd, Arhus (2000)
Google Scholar
Caseli, H.D.M.: Alinhamento sentencial de textos paralelos português-inglês. Master thesis, USP (2002), http://www2.dc.ufscar.br/~helenacaseli/pdf/2002/QualiMestrado.pdf
Chaves, A., Rino, L.: The Mitkov Algorithm for Anaphora Resolution in Portuguese. In: The 8th International Conference on Computational Processing of the Portuguese Language, p. 60 (2008)
Google Scholar
Cohen, W.: Fast effective rule induction. In: 12th International Workshop Conference on Machine Learning, pp. 115–123. Morgan Kaufmann Publishers, Inc. (1995)
Google Scholar
Collovini, S., Carbonel, T.I., Fuchs, J.T., Vieira, R.: Summ-it: Um corpus anotado com informacoes discursivas visando à sumarizacao automática. In: TIL - V Workshop em Tecnologia da Informação e da Linguagem Humana, Rio de Janeiro, pp. 1605–1614 (2007)
Google Scholar
Collovini, S., Vieira, R.: Learning Discourse-new References in Portuguese Texts. In: TIL 2006, pp. 267–276 (2006)
Google Scholar
Cuevas, R., Paraboni, I.: A Machine Learning Approach to Portuguese Pronoun Resolution. In: The 11th Ibero-American Conference on AI: Advances in Artificial Intelligence, pp. 262–271 (2008)
Google Scholar
de Souza, J., Orăsan, C.: Coreference resolution for Portuguese using parallel corpora word alignment. In: The International Conference on Knowledge Engineering, Principles and Techniques (KEPT 2011), Cluj-Napoca, Romania (July 2011)
Google Scholar
Hoste, V., Pauw, G.D.: KNACK-2002: a Richly Annotated Corpus of Dutch Written Text. In: The Fifth International Conference on Language Resources and Evaluation, pp. 1432–1437. ELRA (2006)
Google Scholar
Konstantinova, N., Orăsan, C.: Issues in topic tracking in wikipedia articles. In: The International Conference on Knowledge Engineering, Principles and Techniques (KEPT 2011), Cluj-Napoca, Romania, July 4-6 (2011)
Google Scholar
Luo, X.: On coreference resolution performance metrics. In: The Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 25–32 (2005)
Google Scholar
McCarthy, J.F., Lehnert, W.G.: Using Decision Trees for Coreference Resolution. In: International Joint Conference on Artificial Intelligence, pp. 1050–1055 (1995)
Google Scholar
Mitkov, R., Barbu, C.: Using bilingual corpora to improve pronoun resolution. Languages in contrast 4(2), 201–212 (2004)
Article Google Scholar
Ng, V.: Graph-Cut-Based Anaphoricity Determination for Coreference Resolution. In: NAACL 2009: Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 575–583. Association for Computational Linguistics, Boulder (2009)
Google Scholar
Ng, V.: Supervised Noun Phrase Coreference Research: The First Fifteen Years. In: ACL 2010, pp. 1396–1411 (July 2010)
Google Scholar
Och, F.J., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 19–51 (2003)
Article MATH Google Scholar
Padó, S., Lapata, M.: Cross-lingual annotation projection of semantic roles. J. Artificial Intelligence Research. 36, 307–340 (2009)
MATH Google Scholar
Paraboni, I., Lima, V.L.S.D.: Possessive Pronominal Anaphor Resolution in Portuguese Written Texts - Project Notes. In: 17th International Conference on Computational Linguistics (COLING 1998), pp. 1010–1014. Morgan Kaufmann Publishers, Montreal (1998)
Google Scholar
Postolache, O., Cristea, D., Orăsan, C.: Transferring Coreference Chains through Word Alignment. In: The 5th International Conference on Language Resources and Evaluation, Genoa, Italy (2006)
Google Scholar
Recasens, M., Hovy, E.: A deeper look into features for coreference resolution. Anaphora Processing and Applications (i), 29–42 (2009)
Google Scholar
Recasens, M., Martí, M.A.: AnCora-CO: Coreferentially annotated corpora for Spanish and Catalan. Language Resources and Evaluation 44(4), 341–345 (2009)
Google Scholar
Soon, W.M., Ng, H.T., Lim, D.C.Y.: A Machine Learning Approach to Coreference Resolution of Noun Phrases. Computational Linguistics 27(4), 521–544 (2001)
Article Google Scholar
de Souza, J.G.C., Gonçalves, P.N., Vieira, R.: Learning Coreference Resolution for Portuguese Texts. In: Teixeira, A., de Lima, V.L.S., de Oliveira, L.C., Quaresma, P. (eds.) PROPOR 2008. LNCS (LNAI), vol. 5190, pp. 153–162. Springer, Heidelberg (2008)
Chapter Google Scholar
Stoyanov, V., Cardie, C., Gilbert, N., Buttler, D.: Coreference Resolution with Reconcile. In: The Joint Conference of the 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010). Association for Computational Linguistics (2010)
Google Scholar
Vilain, M., Burger, J., Aberdeen, J., Connolly, D.: A model-theoretic coreference scoring scheme. In: The 6th Conference on Message Understanding, pp. 45–52 (1995)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. The Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann (2005)
Google Scholar
Yarowsky, D., Ngai, G.: Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora. In: Second Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies 2001, pp. 1–8. Association for Computational Linguistics, Pittsburgh (2001)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Research Group in Computational Linguistics, University of Wolverhampton, Wolverhampton, UK
José Guilherme Camargo de Souza & Constantin Orăsan

Authors

José Guilherme Camargo de Souza
View author publications
You can also search for this author in PubMed Google Scholar
Constantin Orăsan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Centro de Linguística da Universidade de Lisboa, Complexo Interdisciplinar da Universidade de Lisboa, Av. Prof. Gama Pinto, 2, 1649-003, Lisboa, Portugal
Iris Hendrickx
K. B. Chandrasekhar Research Centre, Anna University, MIT Campus of Anna University, Chromepet, 600044, Chennai, India
Sobha Lalitha Devi
Faculdade de Ciências, Departamento de Informática, Cidade Universitária, Universidade de Lisboa, 1749-016, Lisboa, Portugal
António Branco
School of Humanities, Languages and Social Studies, Research Group in Computational Linguistics, University of Wolverhampton, WV1 1SB, Wolverhampton, UK
Ruslan Mitkov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de Souza, J.G.C., Orăsan, C. (2011). Can Projected Chains in Parallel Corpora Help Coreference Resolution?. In: Hendrickx, I., Lalitha Devi, S., Branco, A., Mitkov, R. (eds) Anaphora Processing and Applications. DAARC 2011. Lecture Notes in Computer Science(), vol 7099. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25917-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-25917-3_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25916-6
Online ISBN: 978-3-642-25917-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics