Abstract
With the accelerating rate of data growth on the Internet, automatic multi-document summarization has become an important task. In this paper, we propose a link analysis incorporated with rhetorical relations between sentences to perform extractive summarization for multiple-documents. We make use of the documents headlines to extract sentences with salient terms from the documents set using statistical model. Then we assign rhetorical relations learned by SVMs to determine the connectivity between the sentences which include the salient terms. Finally, we rank these sentences by measuring their relative importance within the document set based on link analysis method, PageRank. The rhetorical relations are used to evaluate the complementarity and redundancy of the ranked sentences. Our evaluation results show that the combination of PageRank along with rhetorical relations among sentences does help to improve the quality of extractive summarization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Radev, D.R., Jing, H., Sty, M., Tam, D.: Centroid-based summarization of multiple documents. Inf. Process. Manage. (40), 919–938 (2004)
Bhandari, H., Shimbo, M., Ito, T., Matsumoto, Y.: Generic Text Summarization Using Probabilistic Latent Semantic Indexing. In: The Third International Joint Conference on Natural Language Processing, Hyderabad, India (January 7-12, 2008)
Radev, D.R.: A common theory of information fusion from multiple text sources, step one: Cross-document structure. In: Proceedings of 1st ACL SIGDIAL Workshop on Discourse and Dialogue, Hong Kong (October 2000)
Zhang, Z., Blair-Goldensohn, S., Radev, D.R.: Towards CST-enhanced summarization. In: AAAI 2002 (August 2002)
Jorge, M.L.C., Pardo, T.S.: Experiments with CST-based Multidocument Summarization Workshop on Graph-based Methods for Natural Language Processing, ACL 2010, Uppsala, Sweden, pp. 74–82 (July 2010)
Banko, M., Mittal, V.O., Witbrock, M.J.: Headline Generation Based on Statistical Translation. In: ACL 2000, Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong (October 3-6, 2000)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30(1-7) (1998)
Mihalcea, R.: Graph-based Ranking Algorithms for Sentence Extraction, Applied to Text Summarization. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, companion volume (ACL 2004), Barcelona, Spain (July 2004)
Erkanand, G., Radev, D.: LexPageRank: Prestige in muli-document text summarization. In: Proceedings of EMNLP (2004)
Katz, S.: Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Trans. on Acoustics, Speech and Signal Processing (1987)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Radev, D.R., Otterbacher, J.: CSTBank PhaseI, http://tangra.si.umich.edu/clair/CSTBank/phase1.htm
Brill, E.: A Simple Rule-based Part-of-Speech Tagger. In: Proceedings of 3rd Conference on Applied Natural Language Processing, pp. 152–155 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zahri, N.A.H.B., Fukumoto, F. (2011). Multi-document Summarization Using Link Analysis Based on Rhetorical Relations between Sentences. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6609. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19437-5_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-19437-5_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19436-8
Online ISBN: 978-3-642-19437-5
eBook Packages: Computer ScienceComputer Science (R0)