Learning Similarity Functions in Graph-Based Document Summarization

Ouyang, You; Li, Wenjie; Wei, Furu; Lu, Qin

doi:10.1007/978-3-642-00831-3_18

You Ouyang²¹,
Wenjie Li²¹,
Furu Wei²¹ &
…
Qin Lu²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5459))

Included in the following conference series:

International Conference on Computer Processing of Oriental Languages

839 Accesses
2 Citations

Abstract

Graph-based models have been extensively explored in document summarization in recent years. Compared with traditional feature-based models, graph-based models incorporate interrelated information into the ranking process. Thus, potentially they can do a better job in retrieving the important contents from documents. In this paper, we investigate the problem of how to measure sentence similarity which is a crucial issue in graph-based summarization models but in our belief has not been well defined in the past. We propose a supervised learning approach that brings together multiple similarity measures and makes use of human-generated summaries to guide the combination process. Therefore, it can be expected to provide more accurate estimation than a single cosine similarity measure. Experiments conducted on the DUC2005 and DUC2006 data sets show that the proposed learning approach is successful in measuring similarity. Its competitiveness and adaptability are also demonstrated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Luhn, H.P.: The automatic creation of literature abstracts. IBM J. of R. and D. 2(2) (1958)
Google Scholar
Radev, D.R., Hovy, E., McKeown, K.: Introduction to special issue on summarization. Computational Linguistics 28(4), 399–408 (2002)
Article Google Scholar
Barzilay, R., McKeown, K., Elhadad, M.: Information fusion in the context of multi-document Summarization. In: Proceedings of ACL 1999. College Park, MD (1999)
Google Scholar
Zajic, D., B. Dorr.: Automatic headline generation for newspaper stories. In: Proceedings of the ACL workshop on Automatic Summarization/Document Understanding Conference (2002)
Google Scholar
Ercan, G., Cicekli, I.: Using lexical chains for keyword extraction. Information Processing & Management 43, 1705–1714 (2007)
Article Google Scholar
Kupiec, J.M., Pedersen, J., Chen, F.: A Trainable Document Summarizer. In: Fox, E.A., Ingwersen, P., Fidel, R. (eds.) Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 68–73 (1995)
Google Scholar
Mitra, M., Singhal, A., Buckley, C.: Automatic text summarization by paragraph extraction. In: Proceedings of the ACL 1997 VEACL 1997 Workshop on Intelligent Scalable Text Summarization, Madrid (1997)
Google Scholar
Ouyang, Y., Li, S., Li, W.: Developing learning strategies for topic-based summarization. In: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, Lisbon, Portugal, pp. 79–86 (2007)
Google Scholar
Mihalcea, R., Tarau, P.: TextRank – bringing order into texts. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (2004)
Google Scholar
Erkan, G., Radev, D.R.: LexPageRank: Prestige in Multi-Document Text Summarization. In: Proceedings of EMNLP, pp. 365–371 (2004)
Google Scholar
Zha, H.: Generic Summarization and Key Phrase Extraction using Mutual Reinforcement Principlae and Sentence Clustering. In: Proceedings of ACM SIGIR, pp. 113–120 (2002)
Google Scholar
Mihalcea, R., Tarau, P.: An Algorithm for Language Independent Single and Multiple Document Summarization. In: Proceedings of IJCNLP (2005)
Google Scholar
OtterBacher, J., Erkan, G., Radev, D.R.: Using Random Walks for Question-focused Sentence Retrieval. In: Proceedings of HLT/EMNLP, pp. 915–922 (2005)
Google Scholar
Wan, X., Yang, J., Xiao, J.: Using Cross-Document Random Walks for Topic-Focused Multi-Document Summarization. In: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, pp. 1012–1018 (2006)
Google Scholar
Tombros, A., van Rijsbergen, C.J.: Query-Sensitive Similarity Measures for Information Retrieval. Knowledge and Information Systems 6, 617–642 (2004)
Article Google Scholar
Schölkopf, B., Smola, A., Williamson, R., Bartlett, P.L.: New Support Vector Algorithms. Neural Computation 12, 1207–1245 (2000)
Article Google Scholar
Brin, S., Page, L.: The Anatomy of a Large-scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems 30(1-7), 107–117 (1998)
Article Google Scholar
Dang, H.T.: Overview of DUC 2005. In: Document Understanding Conference 2005 (2005), http://duc.nist.gov
Lin, C.-Y., Hovy, E.: Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics. In: Proceedings of HLT-NAACL, pp. 71–78 (2003)
Google Scholar
Carbonell, J., Goldstein, J.: The use of MMR and diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval (1998)
Google Scholar
Chang, C.-C., Lin, C.-J.: LIBSVM: A Library for Support Vector Machines, http://www.csie.ntu.edu.tw/~cjlin/libsvm

Download references

Author information

Authors and Affiliations

Department of Computing, The Hong Kong Polytechinc University, Hong Kong
You Ouyang, Wenjie Li, Furu Wei & Qin Lu

Authors

You Ouyang
View author publications
You can also search for this author in PubMed Google Scholar
Wenjie Li
View author publications
You can also search for this author in PubMed Google Scholar
Furu Wei
View author publications
You can also search for this author in PubMed Google Scholar
Qin Lu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong
Wenjie Li
Division of Information and Communication Sciences, Macquarie University, NSW 2109, Sydney, Australia
Diego Mollá-Aliod

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ouyang, Y., Li, W., Wei, F., Lu, Q. (2009). Learning Similarity Functions in Graph-Based Document Summarization. In: Li, W., Mollá-Aliod, D. (eds) Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy. ICCPOL 2009. Lecture Notes in Computer Science(), vol 5459. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00831-3_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-00831-3_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00830-6
Online ISBN: 978-3-642-00831-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics