Skip to main content
Log in

A New Approach for Multi-Document Update Summarization

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Fast changing knowledge on the Internet can be acquired more efficiently with the help of automatic document summarization and updating techniques. This paper describes a novel approach for multi-document update summarization. The best summary is defined to be the one which has the minimum information distance to the entire document set. The best update summary has the minimum conditional information distance to a document cluster given that a prior document cluster has already been read. Experiments on the DUC/TAC 2007 to 2009 datasets (http://duc.nist.gov/, http://www.nist.gov/tac/) have proved that our method closely correlates with the human summaries and outperforms other programs such as LexRank in many categories under the ROUGE evaluation criterion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Luhn H P. The automatic creation of literature abstracts. IBM Journal of Research and Development, 1958, 2(2): 159-165.

    Article  MathSciNet  Google Scholar 

  2. Wan X, Yang J, Xiao J. Manifold-ranking based topic-focused multi-document summarization. In Proc IJCAI, Hyderabad, India, Jan. 6-12, 2007, pp.2903-2908.

  3. Li M, Vitányi P M. An Introduction to Kolmogorov Complexity and Its Applications. Springer-Verlag, 1997.

  4. Carbonell J, Goldstein J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proc. SIGIR, Melbourne, Australia, Aug. 24-28, 1998, pp.335-336.

  5. Radev D R, Jing H, Stys M, Tam D. Centroid-based summarization of multiple documents. Information Processing and Management, 2004, 40(6): 919-938.

    Article  MATH  Google Scholar 

  6. Kupiec J, Pedersen J, Chen F. A trainable document summarizer. In Proc. SIGIR, Seattle, USA, Jul. 9-13, 1995, pp.68-73.

  7. Leskovec J, Milic-Frayling N, Grobelnik M. Impact of linguistic analysis on the semantic graph coverage and learning of document extracts. In Proc. AAAI, Pittsburgh, USA, Jul. 9-13, 2005, pp.1069-1074.

  8. Shen D, Sun J T, Li H, Yang Q, Chen Z. Document summarization using conditional random fields. In Proc. IJCAI, Hyderabad, India, Jan. 6-12, 2007, pp.2862-2867.

  9. Zhang J, Cheng X, Wu G, Xu H. Adasum: An adaptive model for summarization. In Proc. CIKM, Napa Valley, USA, Oct. 26-30, 2008, pp.901-909.

  10. Erkan G, Radev D R. Lexpagerank: Prestige in multidocument text summarization. In Proc. EMNLP, Barcelona, Spain, Jul. 25-26, 2004, pp.365-371.

  11. Mihalcea R, Tarau P. Textrank — Bring order into texts. In Proc. EMNLP, Barcelona, Spain, Jul. 25-26, 2004, pp.119-126.

  12. Mihalcea R, Tarau P. A language independent algorithm for single and multiple document summarization. In Proc. IJCNLP, Jeju Island, Korea, Oct.11-13, 2005, pp.19-24.

  13. Wan X, Yang J, Xiao J. Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In Proc. ACL, Prague, Czech Republic, Jun. 23-30, 2007, pp.552-559.

  14. Wan X. An exploration of document impact on graph-based multi-document summarization. In Proc. EMNLP, Hawaii, USA, Oct. 25-27, 2008, pp.755-762.

  15. Bennett C H, Gács P, Li M, Vitányi P M, Zurek W H. Information distance. IEEE Transactions on Information Theory, Jul. 1998, 44(4): 1407-1423.

    Article  MATH  Google Scholar 

  16. Li M, Badger J H, Chen X, Kwong S, Kearney P, Zhang H. An information-based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics, 2001, 17(2): 149-154.

    Article  Google Scholar 

  17. Li M, Chen X, Li X, Ma B, Vitányi P M. The similarity metric. IEEE Transactions on Information Theory, 2004, 50(12): 3250-3264.

    Article  Google Scholar 

  18. Long C, Zhu X, Li M, Ma B. Information shared by many objects. In Proc. CIKM, Napa Valley, USA, Oct. 26-30, 2008, pp.1213-1220.

  19. Benedetto D, Caglioti E, Loreto V. Language trees and zipping. Physical Review Letters, Jan. 2002, 88(4): 048702.

    Article  Google Scholar 

  20. Bennett C H, Li M, Ma B. Chain letters and evolutionary histories. Scientific American, Jun. 2003, 288(6): 76-81.

    Article  Google Scholar 

  21. Cilibrasi R L, Vitányi P M. The Google similarity distance. IEEE Transactions on Knowledge and Data Engineering, Mar. 2007, 19(3): 370-383.

    Article  Google Scholar 

  22. Zhang X, Hao Y, Zhu X, Li M. Information distance from a question to an answer. In Proc. SIGKDD, San Jose, USA, Aug. 12-15, 2007, pp.874-883.

  23. Ziv J, Lempel A. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, 1977, 23(3): 337-343.

    Article  MathSciNet  MATH  Google Scholar 

  24. Lin C Y, Hovy E. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proc. HLT-NAACL, Edmonton, Canada, May 27-June 1, 2003, pp.71-78.

  25. Nenkova A, Passonneau R,Mckeown K. The pyramid method: Incorporating human content selection variation in summarization evaluation. ACM Transactions on Speech and Language Processing, Apr. 2007, 4(2): 1-23.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao-Yan Zhu.

Additional information

The work was supported by the National Natural Science Foundation of China under Grant No. 60973104, the National Basic Research 973 Program of China under Grant No. 2007CB311003, and the IRCI Project from IDRC, Canada.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Long, C., Huang, ML., Zhu, XY. et al. A New Approach for Multi-Document Update Summarization. J. Comput. Sci. Technol. 25, 739–749 (2010). https://doi.org/10.1007/s11390-010-9361-x

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-010-9361-x

Keywords

Navigation