Skip to main content

Dirichlet Process Mixture Model for Summarizing the Social Web

  • Conference paper
  • First Online:
  • 918 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 568))

Abstract

Automatic summarizations have gained increasing attentions as they not only improve reading experiences but also facilitate management of collective knowledge on the social web. The social web is featured by social interactions. Ignoring this type of information limits the ability of traditional summarization techniques to generate more intelligent and comprehensive summaries. In this paper we present a mixture model based on Dirichlet Process, which exploits information contained in tags and other social behaviors. The model assigns each sentence one explicit “topic”. The assignment follows a Chinese Restaurant Process, where an infinite number of topics are organized by a tag or group. The model has straight-forward applications to diverse social summarization tasks. It is a natural fit for flexible data structures and incremental computations. We present applications to tag-driven summarization, comparative summarization and update summarization. We evaluate our model through both quantitative and qualitative experiments on various real world data sets.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://github.com/isnowfy/snownlp.

  2. 2.

    http://www.zhihu.com/question/26472875.

References

  1. Arora, R., Ravindran, B.: Latent Dirichlet allocation and singular value decomposition based multi-document summarization. In: ICDM, pp. 713–718 (2008)

    Google Scholar 

  2. Blei, D.V., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  3. Blei, D.M., Griffiths, T.L., Jordan, M.I.: The nested Chinese restaurant process and bayesian nonparametric inference of topic hierarchies. J. ACM 57, 1–30 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  4. Erkan, G., Radev, D.: LexRank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22(1), 457–479 (2004)

    Google Scholar 

  5. Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  6. Haghighi, A., Vanderwende, L.: Exploring content models for multi-document summarization. In: HLT-NAACL, pp. 362–370 (2009)

    Google Scholar 

  7. He, Z., Chen, C., Bu, J., Wang, C., Zhang, L.: Document summarization based on data reconstruction. In: Proceeding of the Twenty-Sixth AAAI Conference on Artificial Intelligence, pp. 620–626 (2012)

    Google Scholar 

  8. Kim, H.D., Zhai, C.: Generating comparative summaries of contradictory opinions in text. In: 18th ACM Conference on Information and Knowledge Management, pp. 385–394. ACM, New York (2009)

    Google Scholar 

  9. Paul, M.J., Zhai, C., Girju, R.: Summarizing contrastive viewpoints in opinionated text. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 66–76 (2010)

    Google Scholar 

  10. Pelleg, D., Moore, A.: X-means: extending K-means with efficient estimation of the number of clusters. In: Proceedings of the 17th International Conference of Machine Learning, pp. 727–734. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  11. Ramage, D., Hall, D., Nallapati, R., Manning, C.D.: Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 248–256. Association for Computational Linguistics, Singapore (2009)

    Google Scholar 

  12. Shen, C., Li, T.: Multi-document summarization via the minimum dominating set. In: Proceedings 23rd International Conference on Computational Linguistics, pp. 984–992 (2010)

    Google Scholar 

  13. Tang, J., Yao, L., Chen, D.: Multi-topic based query-oriented summarization. In: Proceedings of the Ninth SIAM International Conference on Data Mining, Nevada, USA, pp. 1148–1159 (2009)

    Google Scholar 

  14. Teh, Y., Jordan, M., Beal, M., Blei, D.: Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 101(476), 1566–1581 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  15. Wang, D., Li, T., Zhu, S., Ding, C.: Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 307–314 (2008)

    Google Scholar 

  16. Yin, J., Wang, J.: A Dirichlet multinomial mixture model-based approach for short text clustering. In: SIGKDD, pp. 233–242 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chen Lin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media Singapore

About this paper

Cite this paper

Guan, X., Yang, Y., Yang, X., Lin, C. (2015). Dirichlet Process Mixture Model for Summarizing the Social Web. In: Zhang, X., Sun, M., Wang, Z., Huang, X. (eds) Social Media Processing. SMP 2015. Communications in Computer and Information Science, vol 568. Springer, Singapore. https://doi.org/10.1007/978-981-10-0080-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-0080-5_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-0079-9

  • Online ISBN: 978-981-10-0080-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics