Abstract
Automatic summarizations have gained increasing attentions as they not only improve reading experiences but also facilitate management of collective knowledge on the social web. The social web is featured by social interactions. Ignoring this type of information limits the ability of traditional summarization techniques to generate more intelligent and comprehensive summaries. In this paper we present a mixture model based on Dirichlet Process, which exploits information contained in tags and other social behaviors. The model assigns each sentence one explicit “topic”. The assignment follows a Chinese Restaurant Process, where an infinite number of topics are organized by a tag or group. The model has straight-forward applications to diverse social summarization tasks. It is a natural fit for flexible data structures and incremental computations. We present applications to tag-driven summarization, comparative summarization and update summarization. We evaluate our model through both quantitative and qualitative experiments on various real world data sets.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Arora, R., Ravindran, B.: Latent Dirichlet allocation and singular value decomposition based multi-document summarization. In: ICDM, pp. 713–718 (2008)
Blei, D.V., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Blei, D.M., Griffiths, T.L., Jordan, M.I.: The nested Chinese restaurant process and bayesian nonparametric inference of topic hierarchies. J. ACM 57, 1–30 (2010)
Erkan, G., Radev, D.: LexRank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22(1), 457–479 (2004)
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)
Haghighi, A., Vanderwende, L.: Exploring content models for multi-document summarization. In: HLT-NAACL, pp. 362–370 (2009)
He, Z., Chen, C., Bu, J., Wang, C., Zhang, L.: Document summarization based on data reconstruction. In: Proceeding of the Twenty-Sixth AAAI Conference on Artificial Intelligence, pp. 620–626 (2012)
Kim, H.D., Zhai, C.: Generating comparative summaries of contradictory opinions in text. In: 18th ACM Conference on Information and Knowledge Management, pp. 385–394. ACM, New York (2009)
Paul, M.J., Zhai, C., Girju, R.: Summarizing contrastive viewpoints in opinionated text. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 66–76 (2010)
Pelleg, D., Moore, A.: X-means: extending K-means with efficient estimation of the number of clusters. In: Proceedings of the 17th International Conference of Machine Learning, pp. 727–734. Morgan Kaufmann, San Francisco (2000)
Ramage, D., Hall, D., Nallapati, R., Manning, C.D.: Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 248–256. Association for Computational Linguistics, Singapore (2009)
Shen, C., Li, T.: Multi-document summarization via the minimum dominating set. In: Proceedings 23rd International Conference on Computational Linguistics, pp. 984–992 (2010)
Tang, J., Yao, L., Chen, D.: Multi-topic based query-oriented summarization. In: Proceedings of the Ninth SIAM International Conference on Data Mining, Nevada, USA, pp. 1148–1159 (2009)
Teh, Y., Jordan, M., Beal, M., Blei, D.: Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 101(476), 1566–1581 (2006)
Wang, D., Li, T., Zhu, S., Ding, C.: Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 307–314 (2008)
Yin, J., Wang, J.: A Dirichlet multinomial mixture model-based approach for short text clustering. In: SIGKDD, pp. 233–242 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer Science+Business Media Singapore
About this paper
Cite this paper
Guan, X., Yang, Y., Yang, X., Lin, C. (2015). Dirichlet Process Mixture Model for Summarizing the Social Web. In: Zhang, X., Sun, M., Wang, Z., Huang, X. (eds) Social Media Processing. SMP 2015. Communications in Computer and Information Science, vol 568. Springer, Singapore. https://doi.org/10.1007/978-981-10-0080-5_8
Download citation
DOI: https://doi.org/10.1007/978-981-10-0080-5_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-0079-9
Online ISBN: 978-981-10-0080-5
eBook Packages: Computer ScienceComputer Science (R0)