Skip to main content

Coreference Resolution on Blogs and Commented News

  • Conference paper
Anaphora Processing and Applications (DAARC 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5847))

Included in the following conference series:

Abstract

We focus on automatic coreference resolution for blogs and news articles with user comments as part of a project on opinion mining. We aim to study the effect of the genre shift from edited, structured newspaper text to unedited, unstructured blog data. We compare our coreference resolution system on three data sets: newspaper articles, mixed newspaper articles and reader comments, and blog data. As can be expected the performance of the automatic coreference resolution system drops drastically when tested on unedited text. We describe the characteristics of the different data sets and we examine the typical errors made by the resolution system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bagga, A., Baldwin, B.: Algorithms for scoring coreference chains. In: Proceedings of the First International Conference on Language Resources and Evaluation Workshop on Linguistic Coreference, pp. 563–566 (1998)

    Google Scholar 

  2. Bethard, S., Yu, H., Thornton, A., Hatzivassiloglou, V., Jurafsky, D.: Automatic extraction of opinion propositions and their holders. In: AAAI Spring Symposium on Exploring Attitude and Affect in Text, pp. 22–24 (2004)

    Google Scholar 

  3. Cardie, C., Wagstaff, K.: Noun phrase coreference as clustering. In: Proceedings of the 1999 joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 82–89 (1999)

    Google Scholar 

  4. Choi, Y., Breck, E., Cardie, C.: Joint extraction of entities and relations for opinion recognition. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics (2006)

    Google Scholar 

  5. Daelemans, W., Zavrel, J., Van Den Bosch, A., Van Der Sloot, K.: Memory based tagger, version 2.0, reference guide. Technical Report ILK Technical Report - ILK 03-13, Tilburg University (2003)

    Google Scholar 

  6. Daelemans, W., Zavrel, J., Van der Sloot, K., Van den Bosch, A.: TiMBL: Tilburg Memory Based Learner, version 6.1, reference manual. Technical Report 07-07, ILK, Tilburg University (2007)

    Google Scholar 

  7. Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  8. Fisher, D., Soderland, S., McCarthy, J., Feng, F., Lehnert, W.: Description of the umass system as used for MUC-6. In: Proceedings of the Sixth Message Understanding Conference (MUC-6), pp. 127–140 (1995)

    Google Scholar 

  9. Hoste, V., De Pauw, G.: Knack-2002: a richly annotated corpus of dutch written text. In: The fifth international conference on Language Resources and Evaluation, LREC (2006)

    Google Scholar 

  10. Hoste, V.: Optimization Issues in Machine Learning of Coreference Resolution. PhD thesis, Antwerp University (2005)

    Google Scholar 

  11. Jain, P., Mital, M.R., Kumar, S., Mukerjee, A., Raina, A.M.: Anaphora resolution in multi-person dialogues. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (2004)

    Google Scholar 

  12. Jijkoun, V., De Rijke, M., Mur, J.: Information extraction for question answering: Improving recall through syntactic patterns. In: Coling 2004, pp. 1284–1290 (2004)

    Google Scholar 

  13. Kobayashi, N., Iida, R., Inui, K., Matsumoto, Y.: Opinion extraction using a learning-based anaphora resolution technique. In: Second International Joint Conference on Natural Language Processing: Companion Volume including Posters/Demos and tutorial abstracts, pp. 175–180 (2005)

    Google Scholar 

  14. Liu, B.: Web Data Mining. Exploring Hyperlinks, Contents and Usage Data. Springer, Heidelberg (2006)

    Google Scholar 

  15. Luo, X., Florian, R., Ward, T.: Improving coreference resolution by using conversational metadata. In: Proceedings of NAACL HLT 2009, pp. 201–204 (2009)

    Google Scholar 

  16. Misnhe, G.: Applied Text Analytics for Blogs. PhD thesis. University of Amsterdam, Amsterdam, The Netherlands (2007)

    Google Scholar 

  17. Ng, V., Cardie, C.: Improving machine learning approaches to coreference resolution. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002), pp. 104–111 (2002)

    Google Scholar 

  18. Nicolov, N., Salvetti, F., Ivanova, S.: Sentiment analysis: Does coreference matter? In: Proceedings of the Symposium on Affective Language in Human and Machine, Aberdeen, UK (2008)

    Google Scholar 

  19. Soon, W.M., Ng, H.T., Lim, D.C.Y.: A machine learning approach to coreference resolution of noun phrases. Computational Linguistics 27(4), 521–544 (2001)

    Article  Google Scholar 

  20. Stoyanov, V., Cardie, C.: Topic identification for fine-grained opinion analysis. In: Proceedings of the Conference on Computational Linguistics, COLING 2008 (2008)

    Google Scholar 

  21. Stoyanov, V., Cardie, C.: Partially supervised coreference resolution for opinion summarization through structured rule learning. In: Proceedings of the Conference on Empirical Methods on Natural Language Processing (EMNLP), pp. 336–344. Association for Computational Linguistics (2006)

    Google Scholar 

  22. Strube, M., Rapp, S., Müller, C.: The influence of minimum edit distance on reference resolution. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), pp. 312–319 (2002)

    Google Scholar 

  23. Strube, M., Müller, C.: A machine learning approach to pronoun resolution in spoken dialogue. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pp. 168–175 (2003)

    Google Scholar 

  24. Tjong Kim Sang, E.F., Daelemans, W., Höthker, A.: Reduction of dutch sentences for automatic subtitling. In: Computational Linguistics in the Netherlands 2003. Selected Papers from the Fourteenth CLIN Meeting, pp. 109–123 (2004)

    Google Scholar 

  25. Tjong Kim Sang, E.F.: Introduction to the conll-2002 shared task: Language-independent named entity recognition. In: Roth, D., van den Antal, B. (eds.) Proceedings of CoNLL 2002, pp. 155–158 (2002)

    Google Scholar 

  26. Van Den Bosch, A.: Wrapped progressive sampling search for optimizing learning algorithm parameters. In: Proceedings of the 16th Belgian-Dutch Conference on Artificial Intelligence, pp. 219–226 (2004)

    Google Scholar 

  27. Vilain, M., Burger, J., Aberdeen, J., Connolly, D., Hirschman, L.: A model-theoretic coreference scoring scheme. In: Proceedings of the Sixth Message Understanding Conference (MUC 6), pp. 45–52 (1995)

    Google Scholar 

  28. Wilson, T., Wiebe, J., Hoffman, P.: Recognizing contextual polarity: an exploration of features for phrase-level sentiment analysis. Computational Linguistics (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hendrickx, I., Hoste, V. (2009). Coreference Resolution on Blogs and Commented News. In: Lalitha Devi, S., Branco, A., Mitkov, R. (eds) Anaphora Processing and Applications. DAARC 2009. Lecture Notes in Computer Science(), vol 5847. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04975-0_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04975-0_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04974-3

  • Online ISBN: 978-3-642-04975-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics