Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5459))

Included in the following conference series:

Abstract

Graph-based models have been extensively explored in document summarization in recent years. Compared with traditional feature-based models, graph-based models incorporate interrelated information into the ranking process. Thus, potentially they can do a better job in retrieving the important contents from documents. In this paper, we investigate the problem of how to measure sentence similarity which is a crucial issue in graph-based summarization models but in our belief has not been well defined in the past. We propose a supervised learning approach that brings together multiple similarity measures and makes use of human-generated summaries to guide the combination process. Therefore, it can be expected to provide more accurate estimation than a single cosine similarity measure. Experiments conducted on the DUC2005 and DUC2006 data sets show that the proposed learning approach is successful in measuring similarity. Its competitiveness and adaptability are also demonstrated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Luhn, H.P.: The automatic creation of literature abstracts. IBM J. of R. and D. 2(2) (1958)

    Google Scholar 

  2. Radev, D.R., Hovy, E., McKeown, K.: Introduction to special issue on summarization. Computational Linguistics 28(4), 399–408 (2002)

    Article  Google Scholar 

  3. Barzilay, R., McKeown, K., Elhadad, M.: Information fusion in the context of multi-document Summarization. In: Proceedings of ACL 1999. College Park, MD (1999)

    Google Scholar 

  4. Zajic, D., B. Dorr.: Automatic headline generation for newspaper stories. In: Proceedings of the ACL workshop on Automatic Summarization/Document Understanding Conference (2002)

    Google Scholar 

  5. Ercan, G., Cicekli, I.: Using lexical chains for keyword extraction. Information Processing & Management 43, 1705–1714 (2007)

    Article  Google Scholar 

  6. Kupiec, J.M., Pedersen, J., Chen, F.: A Trainable Document Summarizer. In: Fox, E.A., Ingwersen, P., Fidel, R. (eds.) Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 68–73 (1995)

    Google Scholar 

  7. Mitra, M., Singhal, A., Buckley, C.: Automatic text summarization by paragraph extraction. In: Proceedings of the ACL 1997 VEACL 1997 Workshop on Intelligent Scalable Text Summarization, Madrid (1997)

    Google Scholar 

  8. Ouyang, Y., Li, S., Li, W.: Developing learning strategies for topic-based summarization. In: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, Lisbon, Portugal, pp. 79–86 (2007)

    Google Scholar 

  9. Mihalcea, R., Tarau, P.: TextRank – bringing order into texts. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (2004)

    Google Scholar 

  10. Erkan, G., Radev, D.R.: LexPageRank: Prestige in Multi-Document Text Summarization. In: Proceedings of EMNLP, pp. 365–371 (2004)

    Google Scholar 

  11. Zha, H.: Generic Summarization and Key Phrase Extraction using Mutual Reinforcement Principlae and Sentence Clustering. In: Proceedings of ACM SIGIR, pp. 113–120 (2002)

    Google Scholar 

  12. Mihalcea, R., Tarau, P.: An Algorithm for Language Independent Single and Multiple Document Summarization. In: Proceedings of IJCNLP (2005)

    Google Scholar 

  13. OtterBacher, J., Erkan, G., Radev, D.R.: Using Random Walks for Question-focused Sentence Retrieval. In: Proceedings of HLT/EMNLP, pp. 915–922 (2005)

    Google Scholar 

  14. Wan, X., Yang, J., Xiao, J.: Using Cross-Document Random Walks for Topic-Focused Multi-Document Summarization. In: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, pp. 1012–1018 (2006)

    Google Scholar 

  15. Tombros, A., van Rijsbergen, C.J.: Query-Sensitive Similarity Measures for Information Retrieval. Knowledge and Information Systems 6, 617–642 (2004)

    Article  Google Scholar 

  16. Schölkopf, B., Smola, A., Williamson, R., Bartlett, P.L.: New Support Vector Algorithms. Neural Computation 12, 1207–1245 (2000)

    Article  Google Scholar 

  17. Brin, S., Page, L.: The Anatomy of a Large-scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems 30(1-7), 107–117 (1998)

    Article  Google Scholar 

  18. Dang, H.T.: Overview of DUC 2005. In: Document Understanding Conference 2005 (2005), http://duc.nist.gov

  19. Lin, C.-Y., Hovy, E.: Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics. In: Proceedings of HLT-NAACL, pp. 71–78 (2003)

    Google Scholar 

  20. Carbonell, J., Goldstein, J.: The use of MMR and diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval (1998)

    Google Scholar 

  21. Chang, C.-C., Lin, C.-J.: LIBSVM: A Library for Support Vector Machines, http://www.csie.ntu.edu.tw/~cjlin/libsvm

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ouyang, Y., Li, W., Wei, F., Lu, Q. (2009). Learning Similarity Functions in Graph-Based Document Summarization. In: Li, W., Mollá-Aliod, D. (eds) Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy. ICCPOL 2009. Lecture Notes in Computer Science(), vol 5459. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00831-3_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00831-3_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00830-6

  • Online ISBN: 978-3-642-00831-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics