Multi-document Summarization Using Link Analysis Based on Rhetorical Relations between Sentences

Zahri, Nik Adilah Hanin Binti; Fukumoto, Fumiyo

doi:10.1007/978-3-642-19437-5_27

Nik Adilah Hanin Binti Zahri¹⁷ &
Fumiyo Fukumoto¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6609))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1334 Accesses
4 Citations

Abstract

With the accelerating rate of data growth on the Internet, automatic multi-document summarization has become an important task. In this paper, we propose a link analysis incorporated with rhetorical relations between sentences to perform extractive summarization for multiple-documents. We make use of the documents headlines to extract sentences with salient terms from the documents set using statistical model. Then we assign rhetorical relations learned by SVMs to determine the connectivity between the sentences which include the salient terms. Finally, we rank these sentences by measuring their relative importance within the document set based on link analysis method, PageRank. The rhetorical relations are used to evaluate the complementarity and redundancy of the ranked sentences. Our evaluation results show that the combination of PageRank along with rhetorical relations among sentences does help to improve the quality of extractive summarization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Radev, D.R., Jing, H., Sty, M., Tam, D.: Centroid-based summarization of multiple documents. Inf. Process. Manage. (40), 919–938 (2004)
Google Scholar
Bhandari, H., Shimbo, M., Ito, T., Matsumoto, Y.: Generic Text Summarization Using Probabilistic Latent Semantic Indexing. In: The Third International Joint Conference on Natural Language Processing, Hyderabad, India (January 7-12, 2008)
Google Scholar
Radev, D.R.: A common theory of information fusion from multiple text sources, step one: Cross-document structure. In: Proceedings of 1st ACL SIGDIAL Workshop on Discourse and Dialogue, Hong Kong (October 2000)
Google Scholar
Zhang, Z., Blair-Goldensohn, S., Radev, D.R.: Towards CST-enhanced summarization. In: AAAI 2002 (August 2002)
Google Scholar
Jorge, M.L.C., Pardo, T.S.: Experiments with CST-based Multidocument Summarization Workshop on Graph-based Methods for Natural Language Processing, ACL 2010, Uppsala, Sweden, pp. 74–82 (July 2010)
Google Scholar
Banko, M., Mittal, V.O., Witbrock, M.J.: Headline Generation Based on Statistical Translation. In: ACL 2000, Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong (October 3-6, 2000)
Google Scholar
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)
Article MATH Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30(1-7) (1998)
Google Scholar
Mihalcea, R.: Graph-based Ranking Algorithms for Sentence Extraction, Applied to Text Summarization. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, companion volume (ACL 2004), Barcelona, Spain (July 2004)
Google Scholar
Erkanand, G., Radev, D.: LexPageRank: Prestige in muli-document text summarization. In: Proceedings of EMNLP (2004)
Google Scholar
Katz, S.: Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Trans. on Acoustics, Speech and Signal Processing (1987)
Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Book MATH Google Scholar
Radev, D.R., Otterbacher, J.: CSTBank PhaseI, http://tangra.si.umich.edu/clair/CSTBank/phase1.htm
Brill, E.: A Simple Rule-based Part-of-Speech Tagger. In: Proceedings of 3rd Conference on Applied Natural Language Processing, pp. 152–155 (1992)
Google Scholar

Download references

Author information

Authors and Affiliations

Interdisciplinary Graduate School of Medicine and Engineering, University of Yamanashi, Japan
Nik Adilah Hanin Binti Zahri & Fumiyo Fukumoto

Authors

Nik Adilah Hanin Binti Zahri
View author publications
You can also search for this author in PubMed Google Scholar
Fumiyo Fukumoto
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zahri, N.A.H.B., Fukumoto, F. (2011). Multi-document Summarization Using Link Analysis Based on Rhetorical Relations between Sentences. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6609. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19437-5_27

Download citation

DOI: https://doi.org/10.1007/978-3-642-19437-5_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19436-8
Online ISBN: 978-3-642-19437-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics