Skip to main content
Log in

A document-sensitive graph model for multi-document summarization

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In recent years, graph-based models and ranking algorithms have drawn considerable attention from the extractive document summarization community. Most existing approaches take into account sentence-level relations (e.g. sentence similarity) but neglect the difference among documents and the influence of documents on sentences. In this paper, we present a novel document-sensitive graph model that emphasizes the influence of global document set information on local sentence evaluation. By exploiting document–document and document–sentence relations, we distinguish intra-document sentence relations from inter-document sentence relations. In such a way, we move towards the goal of truly summarizing multiple documents rather than a single combined document. Based on this model, we develop an iterative sentence ranking algorithm, namely DsR (Document-Sensitive Ranking). Automatic ROUGE evaluations on the DUC data sets show that DsR outperforms previous graph-based models in both generic and query-oriented summarization tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1–7): 107–117

    Article  Google Scholar 

  2. DUC: http://www-nlpir.nist.gov/projects/duc/pubs.html

  3. Erkan G, Radev DR (2004a) LexPageRank: prestige in multi-document text summarization. In: Proceedings of the conference on empirical methods in natural language processing, pp 365–371

  4. Erkan G, Radev DR (2004b) LexRank: graph-based centrality as salience in text summarization. J Artif Intell Res 22: 457–479

    Google Scholar 

  5. Haveliwala TT (2003) Topic-sensitive PageRank: a context-sensitive ranking algorithm for web search. IEEE Trans Knowl Data Eng 15(4): 784–796

    Article  Google Scholar 

  6. Kleinberg JM (1999) Authoritative sources in hyperlinked environment. J ACM 46(5): 604–632

    Article  MATH  MathSciNet  Google Scholar 

  7. Langville AN, Meyer CD (2004) Deeper inside PageRank. J Internet Math 1(3): 335–380

    MATH  MathSciNet  Google Scholar 

  8. Lin CY, Hovy E (2003) Automatic evaluation of summaries using N-gram co-occurrence statistics. In: Proceedings of HLT-NAACL, pp 71–78

  9. Lin Z, Chua TS, Kan MY, Lee WS, Qiu L, Ye S (2007) NUS at DUC 2007: using evolutionary models for text. In: Proceedings of Document Understanding Conference (DUC)

  10. MacCluer CR (2000) The many proofs and applications of Perron’s theorem. SIAM Rev 42(3): 487–498

    Article  MATH  MathSciNet  Google Scholar 

  11. Mihalcea R, Tarau P (2004) TextRank—bringing order into text. In: Proceedings of 2004 conference on empirical methods in natural language processing, pp 404–411

  12. Otterbacher J, Erkan G, Radev DR (2005) Using random walks for question-focused sentence retrieval. In: Proceedings of the human language technology conference/conference on empirical methods in natural language processing, pp 915–922

  13. Padmanabhan D, Desikan P, Srivastava J, Riaz K (2005) WICER: A weighted inter-cluster edge ranking for clustered graphs. In: Proceedings of 2005 IEEE/WIC/ACM international conference on web intelligence, pp 522–528

  14. Page L, Brin S, Motwani R, Winograd T (1998) The PageRank citation ranking: bringing order to the web. Stanford University (manuscript in Progress)

  15. Stemmer P. http://www.tartarus.org/~martin/PorterStemmer

  16. Radev DR, Jing HY, Stys M, Tam D (2003) Centroid-based summarization of multiple documents. Inf Process Manage 40: 919–938

    Article  Google Scholar 

  17. Tong H., Faloutsos C, Pan JY (2008) Random walk with restart: fast solutions and applications. Knowl Inf Syst 14(3): 327–346

    Article  MATH  Google Scholar 

  18. Varadarajan R, Hristidis V (2006) A system for query-specific document summarization. In: Proceedings of the 15th ACM conference on information and knowledge management, pp 622–631

  19. Wan X, Yang J, Xiao J (2006a) Using cross-document random walks for topic-focused multi-document summarization. In: Proceedings of the 2006 IEEE/WIC/ACM international conference on web intelligence, pp 1012–1018

  20. Wan X, Yang J, Xiao J (2006b) The great importance of cross-document relationships for multi- document summarization. In: Proceedings of the 21st international conference on the computer processing of oriental languages, pp 131–138

  21. Wu X, Kumar V, Quinlan JR et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1): 1–37

    Article  Google Scholar 

  22. Yoshioka M, Haraguchi M (2004) Multiple news articles summarization based on event reference information. In Working Notes of NTCIR-4

  23. Zha HY (2002) Generic summarization and key phrase extraction using mutual reinforcement principle and sentence clustering. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval, pp 113–120

  24. Zhang Y, Chu CH, Ji X, Zha HY (2004) Correlating summarization of multi-source news with K-way graph bi-clustering. ACM SIGKDD Explor Newslett 6(2): 34–42

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenjie Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wei, F., Li, W., Lu, Q. et al. A document-sensitive graph model for multi-document summarization. Knowl Inf Syst 22, 245–259 (2010). https://doi.org/10.1007/s10115-009-0194-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-009-0194-2

Keywords

Navigation