Abstract
We focus on automatic coreference resolution for blogs and news articles with user comments as part of a project on opinion mining. We aim to study the effect of the genre shift from edited, structured newspaper text to unedited, unstructured blog data. We compare our coreference resolution system on three data sets: newspaper articles, mixed newspaper articles and reader comments, and blog data. As can be expected the performance of the automatic coreference resolution system drops drastically when tested on unedited text. We describe the characteristics of the different data sets and we examine the typical errors made by the resolution system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bagga, A., Baldwin, B.: Algorithms for scoring coreference chains. In: Proceedings of the First International Conference on Language Resources and Evaluation Workshop on Linguistic Coreference, pp. 563–566 (1998)
Bethard, S., Yu, H., Thornton, A., Hatzivassiloglou, V., Jurafsky, D.: Automatic extraction of opinion propositions and their holders. In: AAAI Spring Symposium on Exploring Attitude and Affect in Text, pp. 22–24 (2004)
Cardie, C., Wagstaff, K.: Noun phrase coreference as clustering. In: Proceedings of the 1999 joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 82–89 (1999)
Choi, Y., Breck, E., Cardie, C.: Joint extraction of entities and relations for opinion recognition. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics (2006)
Daelemans, W., Zavrel, J., Van Den Bosch, A., Van Der Sloot, K.: Memory based tagger, version 2.0, reference guide. Technical Report ILK Technical Report - ILK 03-13, Tilburg University (2003)
Daelemans, W., Zavrel, J., Van der Sloot, K., Van den Bosch, A.: TiMBL: Tilburg Memory Based Learner, version 6.1, reference manual. Technical Report 07-07, ILK, Tilburg University (2007)
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Fisher, D., Soderland, S., McCarthy, J., Feng, F., Lehnert, W.: Description of the umass system as used for MUC-6. In: Proceedings of the Sixth Message Understanding Conference (MUC-6), pp. 127–140 (1995)
Hoste, V., De Pauw, G.: Knack-2002: a richly annotated corpus of dutch written text. In: The fifth international conference on Language Resources and Evaluation, LREC (2006)
Hoste, V.: Optimization Issues in Machine Learning of Coreference Resolution. PhD thesis, Antwerp University (2005)
Jain, P., Mital, M.R., Kumar, S., Mukerjee, A., Raina, A.M.: Anaphora resolution in multi-person dialogues. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (2004)
Jijkoun, V., De Rijke, M., Mur, J.: Information extraction for question answering: Improving recall through syntactic patterns. In: Coling 2004, pp. 1284–1290 (2004)
Kobayashi, N., Iida, R., Inui, K., Matsumoto, Y.: Opinion extraction using a learning-based anaphora resolution technique. In: Second International Joint Conference on Natural Language Processing: Companion Volume including Posters/Demos and tutorial abstracts, pp. 175–180 (2005)
Liu, B.: Web Data Mining. Exploring Hyperlinks, Contents and Usage Data. Springer, Heidelberg (2006)
Luo, X., Florian, R., Ward, T.: Improving coreference resolution by using conversational metadata. In: Proceedings of NAACL HLT 2009, pp. 201–204 (2009)
Misnhe, G.: Applied Text Analytics for Blogs. PhD thesis. University of Amsterdam, Amsterdam, The Netherlands (2007)
Ng, V., Cardie, C.: Improving machine learning approaches to coreference resolution. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002), pp. 104–111 (2002)
Nicolov, N., Salvetti, F., Ivanova, S.: Sentiment analysis: Does coreference matter? In: Proceedings of the Symposium on Affective Language in Human and Machine, Aberdeen, UK (2008)
Soon, W.M., Ng, H.T., Lim, D.C.Y.: A machine learning approach to coreference resolution of noun phrases. Computational Linguistics 27(4), 521–544 (2001)
Stoyanov, V., Cardie, C.: Topic identification for fine-grained opinion analysis. In: Proceedings of the Conference on Computational Linguistics, COLING 2008 (2008)
Stoyanov, V., Cardie, C.: Partially supervised coreference resolution for opinion summarization through structured rule learning. In: Proceedings of the Conference on Empirical Methods on Natural Language Processing (EMNLP), pp. 336–344. Association for Computational Linguistics (2006)
Strube, M., Rapp, S., Müller, C.: The influence of minimum edit distance on reference resolution. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), pp. 312–319 (2002)
Strube, M., Müller, C.: A machine learning approach to pronoun resolution in spoken dialogue. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pp. 168–175 (2003)
Tjong Kim Sang, E.F., Daelemans, W., Höthker, A.: Reduction of dutch sentences for automatic subtitling. In: Computational Linguistics in the Netherlands 2003. Selected Papers from the Fourteenth CLIN Meeting, pp. 109–123 (2004)
Tjong Kim Sang, E.F.: Introduction to the conll-2002 shared task: Language-independent named entity recognition. In: Roth, D., van den Antal, B. (eds.) Proceedings of CoNLL 2002, pp. 155–158 (2002)
Van Den Bosch, A.: Wrapped progressive sampling search for optimizing learning algorithm parameters. In: Proceedings of the 16th Belgian-Dutch Conference on Artificial Intelligence, pp. 219–226 (2004)
Vilain, M., Burger, J., Aberdeen, J., Connolly, D., Hirschman, L.: A model-theoretic coreference scoring scheme. In: Proceedings of the Sixth Message Understanding Conference (MUC 6), pp. 45–52 (1995)
Wilson, T., Wiebe, J., Hoffman, P.: Recognizing contextual polarity: an exploration of features for phrase-level sentiment analysis. Computational Linguistics (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hendrickx, I., Hoste, V. (2009). Coreference Resolution on Blogs and Commented News. In: Lalitha Devi, S., Branco, A., Mitkov, R. (eds) Anaphora Processing and Applications. DAARC 2009. Lecture Notes in Computer Science(), vol 5847. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04975-0_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-04975-0_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04974-3
Online ISBN: 978-3-642-04975-0
eBook Packages: Computer ScienceComputer Science (R0)