A document-sensitive graph model for multi-document summarization

Wei, Furu; Li, Wenjie; Lu, Qin; He, Yanxiang

doi:10.1007/s10115-009-0194-2

A document-sensitive graph model for multi-document summarization

Regular Paper
Published: 03 March 2009

Volume 22, pages 245–259, (2010)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Furu Wei^1,2,
Wenjie Li¹,
Qin Lu¹ &
…
Yanxiang He²

520 Accesses
56 Citations
Explore all metrics

Abstract

In recent years, graph-based models and ranking algorithms have drawn considerable attention from the extractive document summarization community. Most existing approaches take into account sentence-level relations (e.g. sentence similarity) but neglect the difference among documents and the influence of documents on sentences. In this paper, we present a novel document-sensitive graph model that emphasizes the influence of global document set information on local sentence evaluation. By exploiting document–document and document–sentence relations, we distinguish intra-document sentence relations from inter-document sentence relations. In such a way, we move towards the goal of truly summarizing multiple documents rather than a single combined document. Based on this model, we develop an iterative sentence ranking algorithm, namely DsR (Document-Sensitive Ranking). Automatic ROUGE evaluations on the DUC data sets show that DsR outperforms previous graph-based models in both generic and query-oriented summarization tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1–7): 107–117
Article Google Scholar
DUC: http://www-nlpir.nist.gov/projects/duc/pubs.html
Erkan G, Radev DR (2004a) LexPageRank: prestige in multi-document text summarization. In: Proceedings of the conference on empirical methods in natural language processing, pp 365–371
Erkan G, Radev DR (2004b) LexRank: graph-based centrality as salience in text summarization. J Artif Intell Res 22: 457–479
Google Scholar
Haveliwala TT (2003) Topic-sensitive PageRank: a context-sensitive ranking algorithm for web search. IEEE Trans Knowl Data Eng 15(4): 784–796
Article Google Scholar
Kleinberg JM (1999) Authoritative sources in hyperlinked environment. J ACM 46(5): 604–632
Article MATH MathSciNet Google Scholar
Langville AN, Meyer CD (2004) Deeper inside PageRank. J Internet Math 1(3): 335–380
MATH MathSciNet Google Scholar
Lin CY, Hovy E (2003) Automatic evaluation of summaries using N-gram co-occurrence statistics. In: Proceedings of HLT-NAACL, pp 71–78
Lin Z, Chua TS, Kan MY, Lee WS, Qiu L, Ye S (2007) NUS at DUC 2007: using evolutionary models for text. In: Proceedings of Document Understanding Conference (DUC)
MacCluer CR (2000) The many proofs and applications of Perron’s theorem. SIAM Rev 42(3): 487–498
Article MATH MathSciNet Google Scholar
Mihalcea R, Tarau P (2004) TextRank—bringing order into text. In: Proceedings of 2004 conference on empirical methods in natural language processing, pp 404–411
Otterbacher J, Erkan G, Radev DR (2005) Using random walks for question-focused sentence retrieval. In: Proceedings of the human language technology conference/conference on empirical methods in natural language processing, pp 915–922
Padmanabhan D, Desikan P, Srivastava J, Riaz K (2005) WICER: A weighted inter-cluster edge ranking for clustered graphs. In: Proceedings of 2005 IEEE/WIC/ACM international conference on web intelligence, pp 522–528
Page L, Brin S, Motwani R, Winograd T (1998) The PageRank citation ranking: bringing order to the web. Stanford University (manuscript in Progress)
Stemmer P. http://www.tartarus.org/~martin/PorterStemmer
Radev DR, Jing HY, Stys M, Tam D (2003) Centroid-based summarization of multiple documents. Inf Process Manage 40: 919–938
Article Google Scholar
Tong H., Faloutsos C, Pan JY (2008) Random walk with restart: fast solutions and applications. Knowl Inf Syst 14(3): 327–346
Article MATH Google Scholar
Varadarajan R, Hristidis V (2006) A system for query-specific document summarization. In: Proceedings of the 15th ACM conference on information and knowledge management, pp 622–631
Wan X, Yang J, Xiao J (2006a) Using cross-document random walks for topic-focused multi-document summarization. In: Proceedings of the 2006 IEEE/WIC/ACM international conference on web intelligence, pp 1012–1018
Wan X, Yang J, Xiao J (2006b) The great importance of cross-document relationships for multi- document summarization. In: Proceedings of the 21st international conference on the computer processing of oriental languages, pp 131–138
Wu X, Kumar V, Quinlan JR et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1): 1–37
Article Google Scholar
Yoshioka M, Haraguchi M (2004) Multiple news articles summarization based on event reference information. In Working Notes of NTCIR-4
Zha HY (2002) Generic summarization and key phrase extraction using mutual reinforcement principle and sentence clustering. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval, pp 113–120
Zhang Y, Chu CH, Ji X, Zha HY (2004) Correlating summarization of multi-source news with K-way graph bi-clustering. ACM SIGKDD Explor Newslett 6(2): 34–42
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong
Furu Wei, Wenjie Li & Qin Lu
Department of Computer Science and Technology, Wuhan University, Wuhan, China
Furu Wei & Yanxiang He

Authors

Furu Wei
View author publications
You can also search for this author in PubMed Google Scholar
Wenjie Li
View author publications
You can also search for this author in PubMed Google Scholar
Qin Lu
View author publications
You can also search for this author in PubMed Google Scholar
Yanxiang He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenjie Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wei, F., Li, W., Lu, Q. et al. A document-sensitive graph model for multi-document summarization. Knowl Inf Syst 22, 245–259 (2010). https://doi.org/10.1007/s10115-009-0194-2

Download citation

Received: 14 November 2007
Revised: 24 November 2008
Accepted: 28 December 2008
Published: 03 March 2009
Issue Date: February 2010
DOI: https://doi.org/10.1007/s10115-009-0194-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A document-sensitive graph model for multi-document summarization

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis

Recent automatic text summarization techniques: a survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A document-sensitive graph model for multi-document summarization

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis

Recent automatic text summarization techniques: a survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation