Generating Update Summaries: Using an Unsupervized Clustering Algorithm to Cluster Sentences

Bossard, Aurélien

doi:10.1007/978-3-642-28569-1_10

Aurélien Bossard⁵

Part of the book series: Theory and Applications of Natural Language Processing ((NLP))

2016 Accesses

Abstract

This article presents a summarization system dedicated to update summarization. We first present the method on which this system is based, CBSEAS, and its adaptation to the update summarization task. Generating update summaries is a far more complicated task than generating “standard” summaries. We describe TAC 2009 “Update Task”, used to evaluate the system. This international evaluation campaign allowed us to compare our system to other automatic summarization systems. The results obtained were mixed: our system ranked among the first quarter for informational content, but only above average for linguistic quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recent advances in document summarization

Article 28 March 2017

Update summarization: building from scratch for Portuguese and comparing to English

Article Open access 21 September 2018

A Systematic Analysis of Sentence Update Detection for Temporal Summarization

Notes

1.
Document Understanding Conference: http://www-nlpir.nist.gov/projects/duc/index.html
2.
Text Analysis Conference: http://www.nist.gov/tac
3.
Tree-tagger webpage: http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/
4.
The AQUAINT-2 collection is a subset of the LDC English Gigaword Third Edition composed of news articles from different press agencies.
5.
NIST: National Institute of Standards and Technology
6.
ROUGE: Recall-Oriented Understudy for Gisting Evaluation
7.
Differential summarization is a variant of update summarization. Its goal is to summarize the differences between two sets of documents, not what is new in a set compared to an earlier set.

References

Barzilay, R., McKeown, K.R.: Sentence fusion for multidocument news summarization. Comput. Linguist. 31(3), 297–328 (2005)
Google Scholar
Barzilay, R., Elhadad, N., McKeown, K.: Inferring strategies for sentence ordering in multidocument news summarization. J. Artif. Intell. Res. (JAIR) 17, 35–55 (2002)
Google Scholar
Boros, E.P., Kantor, P.B., Neu, D.J.: A clustering based approach to creating multi-document summaries. In: Proceedings of the 4th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans (2001)
Google Scholar
Bossard, A., Rodrigues, C.: Combining a multi-document update summarization system – cbseas – with a genetic algorithm. Smart Innovation, Systems and Technologies. Springer (2011)
Google Scholar
Bossard, A., Généreux, M., Poibeau, T.: Description of the lipn systems at tac2008: summarizing information and opinions. In: Notebook Papers and Results of TAC 2008, Gaithersburg (2008)
Google Scholar
Boudin, F., Torres-Moreno, J.-M., El-Bèze, M.: A scalable MMR approach to sentence scoring for multi-document update summarization. In: Proceedings of the 2008 COLING Conference, Manchester, pp. 21–24 (2008)
Google Scholar
Carbonell, J., Goldstein, J.: The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: SIGIR ’98: Proceedings of the 21st Annual International ACM SIGIR Conference, pp. 335–336. ACM, New York (1998)
Google Scholar
Chowdary, C.R., Kumar, P.S.: Esum: an efficient system for query-specific multi-document summarization. In: Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval, ECIR ’09, pp. 724–728. Springer, Berlin/Heidelberg (2009)
Google Scholar
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: a framework and graphical development environment for robust NLP tools and applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics. Philadelphia (2002)
Google Scholar
Dang, H.T., Owczarzak, K.: Overview of the TAC 2008 update summarization task. In: Notebook Papers and Results of TAC 2008, Gaithersburg, pp. 10–23 (2008)
Google Scholar
Dang, H.T., Owczarzak, K.: Overview of the TAC 2009 update summarization task. In: Notebook Papers and Results of TAC 2009, Gaithersburg (2009)
Google Scholar
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-1(2), 224–227 (1979)
Google Scholar
de Loupy, C., Gu$\acute{\text{ e}}$gan, M., Ayache, C., Seng, S., Moreno, J.M.T.: A french human reference corpus for multi-document summarization and sentence compression. In: Proceedings of LREC’10, Valletta (2010)
Google Scholar
Edmundson, H.P., Wyllys, R.E.: Automatic abstracting and indexing—survey and recommendations. Commun. ACM 4(5), 226–234 (1961)
Google Scholar
Erkan, G., Radev, D.R.: Lexrank: graph-based centrality as salience in text summarization. J. Artif. Intell. Res. (JAIR) 22 (2004)
Google Scholar
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT, Cambridge (1998)
Google Scholar
Galanis, D., Malakasiotis, P.: Aueb at tac 2008. In: Notebook Papers and Results of TAC 2008, Gaithersburg (2008)
Google Scholar
Genest, P.É., Lapalme, G., Yousfi-Monod, M.: Hextac: the creation of a manual extractive run. In: Notebook Papers and Results of TAC 2009, Gaithersburg (2009)
Google Scholar
Goldstein, J., Mittal, V., Carbonell, J., Kantrowitz, M.: Multi-document summarization by sentence extraction. In: NAACL-ANLP 2000 Workshop on Automatic Summarization, vol. 4, pp. 40–48. Association for Computational Linguistics, Morristown (2000)
Google Scholar
He, R., Liu, Y., Qin, B., Liu, T., Li, S.: Hitir’s update summary at tac 2008: extractive content selection for language independence. In: Notebook Papers and Results of TAC 2008, Gaithersburg (2008)
Google Scholar
He, T., Chen, J., Gui, Z., Li, F.: Ccnu at tac 2008: proceeding on using semantic method for automated summarization yield. In: Notebook Papers and Results of TAC 2008, Gaithersburg (2008)
Google Scholar
Ji, P.: Multi-document summarization based on unsupervised clustering. In: Ng, H., Leong, M.K., Kan, M.Y., Ji, D. (eds.) Information Retrieval Technology. Lecture Notes in Computer Science, vol. 4182, pp. 560–566. Springer Berlin/Heidelberg (2006)
Google Scholar
Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: International Conference Research on Computational Linguistics (ROCLING X), Taiwan (1997)
Google Scholar
Kupiec, J., Pedersen, J., Chen, F.: A trainable document summarizer. In: SIGIR ’95: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 68–73. ACM, New York (1995). DOI http://doi.acm.org/10.1145/215206.215333
Likas, A., Vlassis, N., , Verbeek, J.: The global k-means clustering algorithm. Pattern Recognit. 36, 451–461 (2001)
Google Scholar
Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), Barcelona (2004)
Google Scholar
Lin, Z., Hoang, H.H., Qiu, L., Ye, S., Kan, M.Y.: NUS at TAC 2008: augmenting timestamped graphs with event information and selectively expanding opinion contexts. In: Proceedings of TAC 2008 Workshop on Automatic Summarization, Gaithersburg (2008)
Google Scholar
Luhn, H.: The automatic creation of literature abstracts. IBM J. 2(2), 159–165 (1958)
Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Le Cam, L.M., Neyman, J. (eds.) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, Statistics. University of California Press, Berkeley (1967)
Google Scholar
Marcu, D.: Improving summarization through rhetorical parsing tuning (1998)
Google Scholar
Nenkova, A., Passonneau, R.J., McKeown, K.: The pyramid method: incorporating human content selection variation in summarization evaluation. TSLP 4(2) (2007)
Google Scholar
Radev, D., Winkel, A., Topper, M.: Multi document centroid-based text summarization. In: Proceedings of the ACL 2002 Demo Session, Philadelphia (2002)
Google Scholar
Ribeiro, R., de Matos, D.M.: Extractive summarization of broadcast news: comparing strategies for european portuguese. In: Proceedings of the 10th International Conference on Text, Speech and Dialogue, TSD’07, pp. 115–122. Springer, Berlin/Heidelberg (2007)
Google Scholar
Saggion, H., Gaizauskas, R.: Multi-document summarization by cluster/profile relevance and redundancy removal. In: Proceedings of the Document Understanding Conference 2004. NIST (2004)
Google Scholar
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the International Conference on New Methods in Language Processing, Manchester (1994)
Google Scholar
Varma, V., Bysani, P., Bharat, K.R.V., Kovelamudi, S., GSK, S., Kumar, K., Maganti, N.: Iit hyderabad at tac 2009. In: Notebook Papers and Results of TAC 2009, Gaithersburg (2009)
Google Scholar
Wang, Y.W.: Sentence Ordering for Multi-Document Summarization in Response to Multiple queries. B.Sc, Northeastern University (2002)
Google Scholar
Wang, B., Liu, B., Sun, C., Wang, X., Li, B.: Adaptive maximum marginal relevance based multi-email summarization. In: Proceedings of the International Conference on Artificial Intelligence and Computational Intelligence, AICI ’09, pp. 417–424. Springer, Berlin/Heidelberg (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire d’Informatique de Paris-Nord (UMR 7030, CNRS et U. Paris 13), 99, av. J.-B. Clément, 93430, Villetaneuse, France
Aurélien Bossard

Authors

Aurélien Bossard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aurélien Bossard .

Editor information

Editors and Affiliations

Universite Sorbonne Nouvelle, LATTICE-CNRS, Ecole Normale Superieure and, rue d'Ulm 45, Paris, 75005, France
Thierry Poibeau
, Information & Communication Technologies, Universitat Pompeu Fabra, C/ Tanger 122-140, Barcelona, 08018, Spain
Horacio Saggion
Institute for Computer Science, Polish Acadmey of Science, ul. Jana Kazimierza 5, Warsaw, 01-248, Poland
Jakub Piskorski
Department of Computer Science, University of Helsinki, Gustaf Hällströmin katu 2, Helsinki, 00014, Finland
Roman Yangarber

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bossard, A. (2013). Generating Update Summaries: Using an Unsupervized Clustering Algorithm to Cluster Sentences. In: Poibeau, T., Saggion, H., Piskorski, J., Yangarber, R. (eds) Multi-source, Multilingual Information Extraction and Summarization. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28569-1_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-28569-1_10
Published: 12 July 2012
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28568-4
Online ISBN: 978-3-642-28569-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics