Skip to main content

Generating Update Summaries: Using an Unsupervized Clustering Algorithm to Cluster Sentences

  • Chapter
  • First Online:
Multi-source, Multilingual Information Extraction and Summarization
  • 2016 Accesses

Abstract

This article presents a summarization system dedicated to update summarization. We first present the method on which this system is based, CBSEAS, and its adaptation to the update summarization task. Generating update summaries is a far more complicated task than generating “standard” summaries. We describe TAC 2009 “Update Task”, used to evaluate the system. This international evaluation campaign allowed us to compare our system to other automatic summarization systems. The results obtained were mixed: our system ranked among the first quarter for informational content, but only above average for linguistic quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Document Understanding Conference: http://www-nlpir.nist.gov/projects/duc/index.html

  2. 2.

    Text Analysis Conference: http://www.nist.gov/tac

  3. 3.

    Tree-tagger webpage: http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/

  4. 4.

    The AQUAINT-2 collection is a subset of the LDC English Gigaword Third Edition composed of news articles from different press agencies.

  5. 5.

    NIST: National Institute of Standards and Technology

  6. 6.

    ROUGE: Recall-Oriented Understudy for Gisting Evaluation

  7. 7.

    Differential summarization is a variant of update summarization. Its goal is to summarize the differences between two sets of documents, not what is new in a set compared to an earlier set.

References

  1. Barzilay, R., McKeown, K.R.: Sentence fusion for multidocument news summarization. Comput. Linguist. 31(3), 297–328 (2005)

    Google Scholar 

  2. Barzilay, R., Elhadad, N., McKeown, K.: Inferring strategies for sentence ordering in multidocument news summarization. J. Artif. Intell. Res. (JAIR) 17, 35–55 (2002)

    Google Scholar 

  3. Boros, E.P., Kantor, P.B., Neu, D.J.: A clustering based approach to creating multi-document summaries. In: Proceedings of the 4th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans (2001)

    Google Scholar 

  4. Bossard, A., Rodrigues, C.: Combining a multi-document update summarization system – cbseas – with a genetic algorithm. Smart Innovation, Systems and Technologies. Springer (2011)

    Google Scholar 

  5. Bossard, A., Généreux, M., Poibeau, T.: Description of the lipn systems at tac2008: summarizing information and opinions. In: Notebook Papers and Results of TAC 2008, Gaithersburg (2008)

    Google Scholar 

  6. Boudin, F., Torres-Moreno, J.-M., El-Bèze, M.: A scalable MMR approach to sentence scoring for multi-document update summarization. In: Proceedings of the 2008 COLING Conference, Manchester, pp. 21–24 (2008)

    Google Scholar 

  7. Carbonell, J., Goldstein, J.: The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: SIGIR ’98: Proceedings of the 21st Annual International ACM SIGIR Conference, pp. 335–336. ACM, New York (1998)

    Google Scholar 

  8. Chowdary, C.R., Kumar, P.S.: Esum: an efficient system for query-specific multi-document summarization. In: Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval, ECIR ’09, pp. 724–728. Springer, Berlin/Heidelberg (2009)

    Google Scholar 

  9. Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: a framework and graphical development environment for robust NLP tools and applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics. Philadelphia (2002)

    Google Scholar 

  10. Dang, H.T., Owczarzak, K.: Overview of the TAC 2008 update summarization task. In: Notebook Papers and Results of TAC 2008, Gaithersburg, pp. 10–23 (2008)

    Google Scholar 

  11. Dang, H.T., Owczarzak, K.: Overview of the TAC 2009 update summarization task. In: Notebook Papers and Results of TAC 2009, Gaithersburg (2009)

    Google Scholar 

  12. Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-1(2), 224–227 (1979)

    Google Scholar 

  13. de Loupy, C., Gu\(\acute{\text{ e}}\)gan, M., Ayache, C., Seng, S., Moreno, J.M.T.: A french human reference corpus for multi-document summarization and sentence compression. In: Proceedings of LREC’10, Valletta (2010)

    Google Scholar 

  14. Edmundson, H.P., Wyllys, R.E.: Automatic abstracting and indexing—survey and recommendations. Commun. ACM 4(5), 226–234 (1961)

    Google Scholar 

  15. Erkan, G., Radev, D.R.: Lexrank: graph-based centrality as salience in text summarization. J. Artif. Intell. Res. (JAIR) 22 (2004)

    Google Scholar 

  16. Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT, Cambridge (1998)

    Google Scholar 

  17. Galanis, D., Malakasiotis, P.: Aueb at tac 2008. In: Notebook Papers and Results of TAC 2008, Gaithersburg (2008)

    Google Scholar 

  18. Genest, P.É., Lapalme, G., Yousfi-Monod, M.: Hextac: the creation of a manual extractive run. In: Notebook Papers and Results of TAC 2009, Gaithersburg (2009)

    Google Scholar 

  19. Goldstein, J., Mittal, V., Carbonell, J., Kantrowitz, M.: Multi-document summarization by sentence extraction. In: NAACL-ANLP 2000 Workshop on Automatic Summarization, vol. 4, pp. 40–48. Association for Computational Linguistics, Morristown (2000)

    Google Scholar 

  20. He, R., Liu, Y., Qin, B., Liu, T., Li, S.: Hitir’s update summary at tac 2008: extractive content selection for language independence. In: Notebook Papers and Results of TAC 2008, Gaithersburg (2008)

    Google Scholar 

  21. He, T., Chen, J., Gui, Z., Li, F.: Ccnu at tac 2008: proceeding on using semantic method for automated summarization yield. In: Notebook Papers and Results of TAC 2008, Gaithersburg (2008)

    Google Scholar 

  22. Ji, P.: Multi-document summarization based on unsupervised clustering. In: Ng, H., Leong, M.K., Kan, M.Y., Ji, D. (eds.) Information Retrieval Technology. Lecture Notes in Computer Science, vol. 4182, pp. 560–566. Springer Berlin/Heidelberg (2006)

    Google Scholar 

  23. Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: International Conference Research on Computational Linguistics (ROCLING X), Taiwan (1997)

    Google Scholar 

  24. Kupiec, J., Pedersen, J., Chen, F.: A trainable document summarizer. In: SIGIR ’95: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 68–73. ACM, New York (1995). DOI http://doi.acm.org/10.1145/215206.215333

  25. Likas, A., Vlassis, N., , Verbeek, J.: The global k-means clustering algorithm. Pattern Recognit. 36, 451–461 (2001)

    Google Scholar 

  26. Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), Barcelona (2004)

    Google Scholar 

  27. Lin, Z., Hoang, H.H., Qiu, L., Ye, S., Kan, M.Y.: NUS at TAC 2008: augmenting timestamped graphs with event information and selectively expanding opinion contexts. In: Proceedings of TAC 2008 Workshop on Automatic Summarization, Gaithersburg (2008)

    Google Scholar 

  28. Luhn, H.: The automatic creation of literature abstracts. IBM J. 2(2), 159–165 (1958)

    Google Scholar 

  29. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Le Cam, L.M., Neyman, J. (eds.) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, Statistics. University of California Press, Berkeley (1967)

    Google Scholar 

  30. Marcu, D.: Improving summarization through rhetorical parsing tuning (1998)

    Google Scholar 

  31. Nenkova, A., Passonneau, R.J., McKeown, K.: The pyramid method: incorporating human content selection variation in summarization evaluation. TSLP 4(2) (2007)

    Google Scholar 

  32. Radev, D., Winkel, A., Topper, M.: Multi document centroid-based text summarization. In: Proceedings of the ACL 2002 Demo Session, Philadelphia (2002)

    Google Scholar 

  33. Ribeiro, R., de Matos, D.M.: Extractive summarization of broadcast news: comparing strategies for european portuguese. In: Proceedings of the 10th International Conference on Text, Speech and Dialogue, TSD’07, pp. 115–122. Springer, Berlin/Heidelberg (2007)

    Google Scholar 

  34. Saggion, H., Gaizauskas, R.: Multi-document summarization by cluster/profile relevance and redundancy removal. In: Proceedings of the Document Understanding Conference 2004. NIST (2004)

    Google Scholar 

  35. Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the International Conference on New Methods in Language Processing, Manchester (1994)

    Google Scholar 

  36. Varma, V., Bysani, P., Bharat, K.R.V., Kovelamudi, S., GSK, S., Kumar, K., Maganti, N.: Iit hyderabad at tac 2009. In: Notebook Papers and Results of TAC 2009, Gaithersburg (2009)

    Google Scholar 

  37. Wang, Y.W.: Sentence Ordering for Multi-Document Summarization in Response to Multiple queries. B.Sc, Northeastern University (2002)

    Google Scholar 

  38. Wang, B., Liu, B., Sun, C., Wang, X., Li, B.: Adaptive maximum marginal relevance based multi-email summarization. In: Proceedings of the International Conference on Artificial Intelligence and Computational Intelligence, AICI ’09, pp. 417–424. Springer, Berlin/Heidelberg (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aurélien Bossard .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Bossard, A. (2013). Generating Update Summaries: Using an Unsupervized Clustering Algorithm to Cluster Sentences. In: Poibeau, T., Saggion, H., Piskorski, J., Yangarber, R. (eds) Multi-source, Multilingual Information Extraction and Summarization. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28569-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28569-1_10

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28568-4

  • Online ISBN: 978-3-642-28569-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics