Abstract
Graph-based models have been extensively explored in document summarization in recent years. Compared with traditional feature-based models, graph-based models incorporate interrelated information into the ranking process. Thus, potentially they can do a better job in retrieving the important contents from documents. In this paper, we investigate the problem of how to measure sentence similarity which is a crucial issue in graph-based summarization models but in our belief has not been well defined in the past. We propose a supervised learning approach that brings together multiple similarity measures and makes use of human-generated summaries to guide the combination process. Therefore, it can be expected to provide more accurate estimation than a single cosine similarity measure. Experiments conducted on the DUC2005 and DUC2006 data sets show that the proposed learning approach is successful in measuring similarity. Its competitiveness and adaptability are also demonstrated.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Luhn, H.P.: The automatic creation of literature abstracts. IBM J. of R. and D. 2(2) (1958)
Radev, D.R., Hovy, E., McKeown, K.: Introduction to special issue on summarization. Computational Linguistics 28(4), 399–408 (2002)
Barzilay, R., McKeown, K., Elhadad, M.: Information fusion in the context of multi-document Summarization. In: Proceedings of ACL 1999. College Park, MD (1999)
Zajic, D., B. Dorr.: Automatic headline generation for newspaper stories. In: Proceedings of the ACL workshop on Automatic Summarization/Document Understanding Conference (2002)
Ercan, G., Cicekli, I.: Using lexical chains for keyword extraction. Information Processing & Management 43, 1705–1714 (2007)
Kupiec, J.M., Pedersen, J., Chen, F.: A Trainable Document Summarizer. In: Fox, E.A., Ingwersen, P., Fidel, R. (eds.) Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 68–73 (1995)
Mitra, M., Singhal, A., Buckley, C.: Automatic text summarization by paragraph extraction. In: Proceedings of the ACL 1997 VEACL 1997 Workshop on Intelligent Scalable Text Summarization, Madrid (1997)
Ouyang, Y., Li, S., Li, W.: Developing learning strategies for topic-based summarization. In: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, Lisbon, Portugal, pp. 79–86 (2007)
Mihalcea, R., Tarau, P.: TextRank – bringing order into texts. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (2004)
Erkan, G., Radev, D.R.: LexPageRank: Prestige in Multi-Document Text Summarization. In: Proceedings of EMNLP, pp. 365–371 (2004)
Zha, H.: Generic Summarization and Key Phrase Extraction using Mutual Reinforcement Principlae and Sentence Clustering. In: Proceedings of ACM SIGIR, pp. 113–120 (2002)
Mihalcea, R., Tarau, P.: An Algorithm for Language Independent Single and Multiple Document Summarization. In: Proceedings of IJCNLP (2005)
OtterBacher, J., Erkan, G., Radev, D.R.: Using Random Walks for Question-focused Sentence Retrieval. In: Proceedings of HLT/EMNLP, pp. 915–922 (2005)
Wan, X., Yang, J., Xiao, J.: Using Cross-Document Random Walks for Topic-Focused Multi-Document Summarization. In: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, pp. 1012–1018 (2006)
Tombros, A., van Rijsbergen, C.J.: Query-Sensitive Similarity Measures for Information Retrieval. Knowledge and Information Systems 6, 617–642 (2004)
Schölkopf, B., Smola, A., Williamson, R., Bartlett, P.L.: New Support Vector Algorithms. Neural Computation 12, 1207–1245 (2000)
Brin, S., Page, L.: The Anatomy of a Large-scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems 30(1-7), 107–117 (1998)
Dang, H.T.: Overview of DUC 2005. In: Document Understanding Conference 2005 (2005), http://duc.nist.gov
Lin, C.-Y., Hovy, E.: Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics. In: Proceedings of HLT-NAACL, pp. 71–78 (2003)
Carbonell, J., Goldstein, J.: The use of MMR and diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval (1998)
Chang, C.-C., Lin, C.-J.: LIBSVM: A Library for Support Vector Machines, http://www.csie.ntu.edu.tw/~cjlin/libsvm
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ouyang, Y., Li, W., Wei, F., Lu, Q. (2009). Learning Similarity Functions in Graph-Based Document Summarization. In: Li, W., Mollá-Aliod, D. (eds) Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy. ICCPOL 2009. Lecture Notes in Computer Science(), vol 5459. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00831-3_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-00831-3_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00830-6
Online ISBN: 978-3-642-00831-3
eBook Packages: Computer ScienceComputer Science (R0)