Skip to main content

Multi-document Summarization Using Link Analysis Based on Rhetorical Relations between Sentences

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6609))

Abstract

With the accelerating rate of data growth on the Internet, automatic multi-document summarization has become an important task. In this paper, we propose a link analysis incorporated with rhetorical relations between sentences to perform extractive summarization for multiple-documents. We make use of the documents headlines to extract sentences with salient terms from the documents set using statistical model. Then we assign rhetorical relations learned by SVMs to determine the connectivity between the sentences which include the salient terms. Finally, we rank these sentences by measuring their relative importance within the document set based on link analysis method, PageRank. The rhetorical relations are used to evaluate the complementarity and redundancy of the ranked sentences. Our evaluation results show that the combination of PageRank along with rhetorical relations among sentences does help to improve the quality of extractive summarization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Radev, D.R., Jing, H., Sty, M., Tam, D.: Centroid-based summarization of multiple documents. Inf. Process. Manage. (40), 919–938 (2004)

    Google Scholar 

  2. Bhandari, H., Shimbo, M., Ito, T., Matsumoto, Y.: Generic Text Summarization Using Probabilistic Latent Semantic Indexing. In: The Third International Joint Conference on Natural Language Processing, Hyderabad, India (January 7-12, 2008)

    Google Scholar 

  3. Radev, D.R.: A common theory of information fusion from multiple text sources, step one: Cross-document structure. In: Proceedings of 1st ACL SIGDIAL Workshop on Discourse and Dialogue, Hong Kong (October 2000)

    Google Scholar 

  4. Zhang, Z., Blair-Goldensohn, S., Radev, D.R.: Towards CST-enhanced summarization. In: AAAI 2002 (August 2002)

    Google Scholar 

  5. Jorge, M.L.C., Pardo, T.S.: Experiments with CST-based Multidocument Summarization Workshop on Graph-based Methods for Natural Language Processing, ACL 2010, Uppsala, Sweden, pp. 74–82 (July 2010)

    Google Scholar 

  6. Banko, M., Mittal, V.O., Witbrock, M.J.: Headline Generation Based on Statistical Translation. In: ACL 2000, Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong (October 3-6, 2000)

    Google Scholar 

  7. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)

    Article  MATH  Google Scholar 

  8. Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30(1-7) (1998)

    Google Scholar 

  9. Mihalcea, R.: Graph-based Ranking Algorithms for Sentence Extraction, Applied to Text Summarization. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, companion volume (ACL 2004), Barcelona, Spain (July 2004)

    Google Scholar 

  10. Erkanand, G., Radev, D.: LexPageRank: Prestige in muli-document text summarization. In: Proceedings of EMNLP (2004)

    Google Scholar 

  11. Katz, S.: Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Trans. on Acoustics, Speech and Signal Processing (1987)

    Google Scholar 

  12. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)

    Book  MATH  Google Scholar 

  13. Radev, D.R., Otterbacher, J.: CSTBank PhaseI, http://tangra.si.umich.edu/clair/CSTBank/phase1.htm

  14. Brill, E.: A Simple Rule-based Part-of-Speech Tagger. In: Proceedings of 3rd Conference on Applied Natural Language Processing, pp. 152–155 (1992)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zahri, N.A.H.B., Fukumoto, F. (2011). Multi-document Summarization Using Link Analysis Based on Rhetorical Relations between Sentences. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6609. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19437-5_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19437-5_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19436-8

  • Online ISBN: 978-3-642-19437-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics