Abstract
Graph-based methods have been developed for multi-document summarization in recent years and they make use of the relationships between sentences in a graph-based ranking algorithm to extract salient sentences. This paper proposes to differentiate the cross-document relationships and the within-document relationships between sentences for multi-document summarization. The two kinds of relationships between sentences are deemed to have unequal contributions in the graph-based ranking algorithm. We apply the graph-based ranking algorithm based on each kind of sentence relationships and explore their relative importance for multi-document summarization. Experimental results on DUC 2002 and DUC 2004 data demonstrate the great importance of the cross-document relationships between sentences for multi-document summarization. Even the system based only on the cross-document relation-ships can perform better than or at least as well as the systems based on both kinds of relationships between sentences.
Keywords
- Longe Common Subsequence
- Text Summarization
- Summarization Method
- Document Summarization
- Diversity Penalty
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrival. ACM Press and Addison Wesley (1999)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30, 1–7 (1984)
Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of SIGIR 1998 (1998)
Erkan, G., Radev, D.: LexPageRank: prestige in multi-document text summarization. In: Proceedings of EMNLP 2004 (2004)
Harabagiu, S., Lacatusu, F.: Topic themes for multi-document summarization. In: Proceedings of SIGIR 2005, Salvador, Brazil, pp. 202–209 (2005)
Hardy, H., Shimizu, N., Strzalkowski, T., Ting, L., Wise, G.B., Zhang, X.: Cross-document summarization by concept classification. In: Proceedings of SIGIR 2002, Tampere, Finland (2002)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)
Lin, C.-Y., Hovy, E.H.: From Single to Multi-document Summarization: A Prototype System and its Evaluation. In: Proceedings of ACL 2002 (2002)
Lin, C.-Y., Hovy, E.H.: Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics. In: Proceedings of HLT-NAACL 2003 (2003)
Mani, I., Bloedorn, E.: Summarizing Similarities and Differences Among Related Documents. Information Retrieval 1(1) (2000)
Mihalcea, R., Tarau, P.: A language independent algorithm for single and multiple document summarization. In: Proceedings of IJCNLP 2005 (2005)
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Radev, D., Allison, T., Blair-Goldensohn, S., Blitzer, J., et al.: The Mead multi-document summarizer (2003), http://www.summarization.com/mead/
Radev, D.R., Jing, H.Y., Stys, M., Tam, D.: Centroid-based summarization of multiple documents. Information Processing and Management 40, 919–938 (2004)
Zhang, B., Li, H., Liu, Y., Ji, L., Xi, W., Fan, W., Chen, Z., Ma, W.-Y.: Improving web search results using affinity graph. In: Proceedings of SIGIR 2005 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wan, X., Yang, J., Xiao, J. (2006). The Great Importance of Cross-Document Relationships for Multi-document Summarization. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_13
Download citation
DOI: https://doi.org/10.1007/11940098_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49667-0
Online ISBN: 978-3-540-49668-7
eBook Packages: Computer ScienceComputer Science (R0)