Abstract
The goal of query-focused summarization is to extract a summary for a given query from the document collection. Although much work has been done for this problem, there are still many challenging issues: (1) The length of the summary is predefined by, for example, the number of word tokens or the number of sentences. (2) A query usually asks for information of several perspectives (topics); however existing methods cannot capture topical aspects with respect to the query. In this paper, we propose a novel approach by combining statistical topic model and affinity propagation. Specifically, the topic model, called qLDA, can simultaneously model documents and the query. Moreover, the affinity propagation can automatically discover key sentences from the document collection without predefining the length of the summary. Experimental results on DUC05 and DUC06 data sets show that our approach is effective and the summarization performance is better than baseline methods.
The work is supported by NSFC (60703059), Chinese National Key Foundation Research and Development Plan (2007CB310803), and Chinese Young Faculty Funding (20070003093).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Barzilay, R., Lee, L.: Catching the drift: probabilistic content models, with applications to generation and summarization. In: Proceedings of HLT-NAACL 2004 (2004)
Bhandari, H., Shimbo, M., Ito, T., Matsumoto, Y.: Generic text summarization using probabilistic latent semantic indexing. In: Proceedings of IJCNLP 2008 (2008)
Blei, D., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. JMLR 3, 993–1022 (2003)
Blei, D., Griffiths, T., Jordan, M., Tenenbaum, J.: Hierarchical topic models and the nested Chinese restaurant process. In: Proceedings of NIPS 2004 (2004)
Chen, B., Chen, Y.: Word Topical Mixture Models for Extractive Spoken Document Summarization. In: Proceedings of ICME 2007 (2007)
Conroy, J., Schlesinger, J., O’Leary, D.: Topic Focused Multi-document Summarization Using an Approximate Oracle Score. In: Proceedings of ACL 2006 (2006)
Daumé III, H., Marcu, D.: Bayesian Query-Focused Summarization. In: Proceedings of ACL 2006 (2006)
Frey, B., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)
Griffiths, T., Steyvers, M.: Finding scientific topics. In: Proceedings of NAS, pp. 5228–5235 (2004)
Harabagiu, S., Lacatusu, F.: Topic Themes for Multi-Document Summarization. In: Proceedings of SIGIR 2005 (2005)
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of SIGIR 1999 (1999)
Kong, S., Lee, L.: Improved Spoken Document Summarization Using Probabilistic Latent Semantic Analysis (PLSA). In: Proceedings of ICASS 2006 (2006)
Kumar, R., Mahadevan, U., Sivakumar, D.: A graph-theoretic approach to extract storylines from search results. In: Proceedings of KDD 2004, pp. 216–225 (2004)
Kullback, S., Leibler, R.A.: On information and sufficiency. Annals of Mathematical Statistics, vol. 22, pp. 79–86 (1951)
Li, W., Li, W., Li, B., Chen, Q., Wu, M.: The Hong Kong Polytechnic University at DUC2005. In: Proceedings of DUC 2005 (2005)
Lin, C., Hovy, E.: The Automatic Acquisition of Topic Signatures for Text Summarization. In: Proceedings of COLING 2000 (2000)
Lin, C., Hovy, E.: Automatic evaluation of summaries using N-gram co-occurrence statistics. In: Proceedings of HLT-NAACL 2003 (2003)
Mei, Q., Ling, X., Wondra, M., Su, H., Zhai, C.: Topic Sentiment Mixture: Modeling Facets and Opinions in Weblogs. In: Proceedings of WWW 2007 (2007)
Nenkova, A., Vanderwende, L., McKeown, K.: A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization. In: Proceedings of SIGIR 2006 (2006)
Page, L., Brin, S., Motwani, R., Winograd, T.: PageRank Bringing Order to the Web. Stanford University (1999)
Steyvers, M., Smyth, P., Griffiths, T.: Probabilistic author topic models for information discovery. In: Proceedings of SIGKDD 2004, pp. 306–315 (2004)
Tang, J., Yao, L., Chen, D.: Multi-topic based query-oriented summarization. In: Proceedings of SDM 2009 (2009)
Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: ArnetMiner: extraction and mining of academic social networks. In: Proceedings of SIGKDD 2008, pp. 990–998 (2008)
Wei, X., Bruce Croft, W.: LDA-based document models for Ad-hoc retrieval. In: Proceedings of SIGIR 2006 (2006)
Ye, S., Qiu, L., Chua, T., Kan, M.: NUS at DUC2005: Understanding documents via concept links. In: Proceedings of DUC 2005 (2005)
Yih, W., Goodman, J., Vanderwende, L., Suzuki, H.: Multi-document summarization by maximizing informative content-words. In: Proceedings of IJCAI 2007 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, D., Tang, J., Yao, L., Li, J., Zhou, L. (2009). Query-Focused Summarization by Combining Topic Model and Affinity Propagation. In: Li, Q., Feng, L., Pei, J., Wang, S.X., Zhou, X., Zhu, QM. (eds) Advances in Data and Web Management. APWeb WAIM 2009 2009. Lecture Notes in Computer Science, vol 5446. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00672-2_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-00672-2_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00671-5
Online ISBN: 978-3-642-00672-2
eBook Packages: Computer ScienceComputer Science (R0)