Abstract
This article presents a summarization system dedicated to update summarization. We first present the method on which this system is based, CBSEAS, and its adaptation to the update summarization task. Generating update summaries is a far more complicated task than generating “standard” summaries. We describe TAC 2009 “Update Task”, used to evaluate the system. This international evaluation campaign allowed us to compare our system to other automatic summarization systems. The results obtained were mixed: our system ranked among the first quarter for informational content, but only above average for linguistic quality.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Document Understanding Conference: http://www-nlpir.nist.gov/projects/duc/index.html
- 2.
Text Analysis Conference: http://www.nist.gov/tac
- 3.
Tree-tagger webpage: http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/
- 4.
The AQUAINT-2 collection is a subset of the LDC English Gigaword Third Edition composed of news articles from different press agencies.
- 5.
NIST: National Institute of Standards and Technology
- 6.
ROUGE: Recall-Oriented Understudy for Gisting Evaluation
- 7.
Differential summarization is a variant of update summarization. Its goal is to summarize the differences between two sets of documents, not what is new in a set compared to an earlier set.
References
Barzilay, R., McKeown, K.R.: Sentence fusion for multidocument news summarization. Comput. Linguist. 31(3), 297–328 (2005)
Barzilay, R., Elhadad, N., McKeown, K.: Inferring strategies for sentence ordering in multidocument news summarization. J. Artif. Intell. Res. (JAIR) 17, 35–55 (2002)
Boros, E.P., Kantor, P.B., Neu, D.J.: A clustering based approach to creating multi-document summaries. In: Proceedings of the 4th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans (2001)
Bossard, A., Rodrigues, C.: Combining a multi-document update summarization system – cbseas – with a genetic algorithm. Smart Innovation, Systems and Technologies. Springer (2011)
Bossard, A., Généreux, M., Poibeau, T.: Description of the lipn systems at tac2008: summarizing information and opinions. In: Notebook Papers and Results of TAC 2008, Gaithersburg (2008)
Boudin, F., Torres-Moreno, J.-M., El-Bèze, M.: A scalable MMR approach to sentence scoring for multi-document update summarization. In: Proceedings of the 2008 COLING Conference, Manchester, pp. 21–24 (2008)
Carbonell, J., Goldstein, J.: The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: SIGIR ’98: Proceedings of the 21st Annual International ACM SIGIR Conference, pp. 335–336. ACM, New York (1998)
Chowdary, C.R., Kumar, P.S.: Esum: an efficient system for query-specific multi-document summarization. In: Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval, ECIR ’09, pp. 724–728. Springer, Berlin/Heidelberg (2009)
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: a framework and graphical development environment for robust NLP tools and applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics. Philadelphia (2002)
Dang, H.T., Owczarzak, K.: Overview of the TAC 2008 update summarization task. In: Notebook Papers and Results of TAC 2008, Gaithersburg, pp. 10–23 (2008)
Dang, H.T., Owczarzak, K.: Overview of the TAC 2009 update summarization task. In: Notebook Papers and Results of TAC 2009, Gaithersburg (2009)
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-1(2), 224–227 (1979)
de Loupy, C., Gu\(\acute{\text{ e}}\)gan, M., Ayache, C., Seng, S., Moreno, J.M.T.: A french human reference corpus for multi-document summarization and sentence compression. In: Proceedings of LREC’10, Valletta (2010)
Edmundson, H.P., Wyllys, R.E.: Automatic abstracting and indexing—survey and recommendations. Commun. ACM 4(5), 226–234 (1961)
Erkan, G., Radev, D.R.: Lexrank: graph-based centrality as salience in text summarization. J. Artif. Intell. Res. (JAIR) 22 (2004)
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT, Cambridge (1998)
Galanis, D., Malakasiotis, P.: Aueb at tac 2008. In: Notebook Papers and Results of TAC 2008, Gaithersburg (2008)
Genest, P.É., Lapalme, G., Yousfi-Monod, M.: Hextac: the creation of a manual extractive run. In: Notebook Papers and Results of TAC 2009, Gaithersburg (2009)
Goldstein, J., Mittal, V., Carbonell, J., Kantrowitz, M.: Multi-document summarization by sentence extraction. In: NAACL-ANLP 2000 Workshop on Automatic Summarization, vol. 4, pp. 40–48. Association for Computational Linguistics, Morristown (2000)
He, R., Liu, Y., Qin, B., Liu, T., Li, S.: Hitir’s update summary at tac 2008: extractive content selection for language independence. In: Notebook Papers and Results of TAC 2008, Gaithersburg (2008)
He, T., Chen, J., Gui, Z., Li, F.: Ccnu at tac 2008: proceeding on using semantic method for automated summarization yield. In: Notebook Papers and Results of TAC 2008, Gaithersburg (2008)
Ji, P.: Multi-document summarization based on unsupervised clustering. In: Ng, H., Leong, M.K., Kan, M.Y., Ji, D. (eds.) Information Retrieval Technology. Lecture Notes in Computer Science, vol. 4182, pp. 560–566. Springer Berlin/Heidelberg (2006)
Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: International Conference Research on Computational Linguistics (ROCLING X), Taiwan (1997)
Kupiec, J., Pedersen, J., Chen, F.: A trainable document summarizer. In: SIGIR ’95: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 68–73. ACM, New York (1995). DOI http://doi.acm.org/10.1145/215206.215333
Likas, A., Vlassis, N., , Verbeek, J.: The global k-means clustering algorithm. Pattern Recognit. 36, 451–461 (2001)
Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), Barcelona (2004)
Lin, Z., Hoang, H.H., Qiu, L., Ye, S., Kan, M.Y.: NUS at TAC 2008: augmenting timestamped graphs with event information and selectively expanding opinion contexts. In: Proceedings of TAC 2008 Workshop on Automatic Summarization, Gaithersburg (2008)
Luhn, H.: The automatic creation of literature abstracts. IBM J. 2(2), 159–165 (1958)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Le Cam, L.M., Neyman, J. (eds.) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, Statistics. University of California Press, Berkeley (1967)
Marcu, D.: Improving summarization through rhetorical parsing tuning (1998)
Nenkova, A., Passonneau, R.J., McKeown, K.: The pyramid method: incorporating human content selection variation in summarization evaluation. TSLP 4(2) (2007)
Radev, D., Winkel, A., Topper, M.: Multi document centroid-based text summarization. In: Proceedings of the ACL 2002 Demo Session, Philadelphia (2002)
Ribeiro, R., de Matos, D.M.: Extractive summarization of broadcast news: comparing strategies for european portuguese. In: Proceedings of the 10th International Conference on Text, Speech and Dialogue, TSD’07, pp. 115–122. Springer, Berlin/Heidelberg (2007)
Saggion, H., Gaizauskas, R.: Multi-document summarization by cluster/profile relevance and redundancy removal. In: Proceedings of the Document Understanding Conference 2004. NIST (2004)
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the International Conference on New Methods in Language Processing, Manchester (1994)
Varma, V., Bysani, P., Bharat, K.R.V., Kovelamudi, S., GSK, S., Kumar, K., Maganti, N.: Iit hyderabad at tac 2009. In: Notebook Papers and Results of TAC 2009, Gaithersburg (2009)
Wang, Y.W.: Sentence Ordering for Multi-Document Summarization in Response to Multiple queries. B.Sc, Northeastern University (2002)
Wang, B., Liu, B., Sun, C., Wang, X., Li, B.: Adaptive maximum marginal relevance based multi-email summarization. In: Proceedings of the International Conference on Artificial Intelligence and Computational Intelligence, AICI ’09, pp. 417–424. Springer, Berlin/Heidelberg (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Bossard, A. (2013). Generating Update Summaries: Using an Unsupervized Clustering Algorithm to Cluster Sentences. In: Poibeau, T., Saggion, H., Piskorski, J., Yangarber, R. (eds) Multi-source, Multilingual Information Extraction and Summarization. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28569-1_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-28569-1_10
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28568-4
Online ISBN: 978-3-642-28569-1
eBook Packages: Computer ScienceComputer Science (R0)