Skip to main content

Summarizing Weibo with Topics Compression

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10762))

Abstract

Extractive text summarization aims at selecting a small subset of sentences so that the contents and meaning of the original document are best preserved. In this paper we describe an unsupervised approach to extractive summarization. It combines hierarchical topic modeling (TM) with the Minimal Description Length (MDL) principle and applies them to Chinese language. Our summarizer strives to extract information that provides the best description of text topics in terms of MDL. This model is applied to the NLPCC 2015 Shared Task of Weibo-Oriented Chinese News Summarization [1], where Chinese texts from news articles were summarized with the goal of creating short meaningful messages for Weibo (Sina Weibo is a Chinese microblogging website, one of the most popular sites in China.) [2]. The experimental results disclose superiority of our approach over other summarizers from the NLPCC 2015 competition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://ictclas.nlpir.org/.

  2. 2.

    Because our system did not participate in the NLPCC competition, all experiments were re-run by ourselves.

  3. 3.

    ranked first by Rouge, F-measure.

References

  1. Wan, X., Zhang, J., Wen, S., Tan, J.: Overview of the NLPCC 2015 shared task: weibo-oriented chinese news summarization. In: Li, J., Ji, H., Zhao, D., Feng, Y. (eds.) NLPCC 2015. LNCS (LNAI), vol. 9362, pp. 557–561. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25207-0_52

    Chapter  Google Scholar 

  2. SINA, C.: Sina weibo (2009). www.weibo.com

  3. Lakshmanan, L.V.S., Ng, R.T., Wang, C.X., Zhou, X., Johnson, T.J.: The generalized mdl approach for summarization. In: Proceedings of the 28th International Conference on Very Large Data Bases, VLDB 2002, pp. 766–777 (2002)

    Chapter  Google Scholar 

  4. Bu, S., Lakshmanan, L.V.S., Ng, R.T.: Mdl summarization with holes. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB 2005 pp. 433–444 (2005)

    Google Scholar 

  5. Nomoto, T., Matsumoto, Y.: A new approach to unsupervised text summarization. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR ’01, pp. 26–34 (2001)

    Google Scholar 

  6. Nomoto, T.: Machine learning approaches to rhetorical parsing and open-domain text summarization. Ph.D. thesis, Nara Institute of Science and Technology (2004)

    Google Scholar 

  7. Nguyen, T.S., Lauw, H.W., Tsaparas, P.: Review synthesis for micro-review summarization. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, WSDM 2015, pp. 169–178 (2015)

    Google Scholar 

  8. Baralis, E., Cagliero, L., Jabeen, S., Fiori, A.: Multi-document summarization exploiting frequent itemsets. In: Proceedings of the 27th Annual ACM Symposium on Applied Computing, SAC 2012 pp. 782–786 (2012)

    Google Scholar 

  9. Agarwal, N., Gvr, K., Reddy, R.S., Ros, C.P.: Scisumm: a multi-document summarization system for scientific articles. In: Proceedings of the ACL-HLT 2011 System Demonstrations, pp. 115–120 (2011)

    Google Scholar 

  10. Dalal, M.K., Zaveri, M.A.: Semisupervised learning based opinion summarization and classification for online product reviews. Appl. Comput. Intell. Soft Comput. 2013,10 (2013)

    Google Scholar 

  11. Danon, G., Schneider, M., Last, M., Litvak, M., Kandel, A.: An apriori-like algorithm for extracting fuzzy association rules between keyphrases in text documents. In: Proceedings of the 11th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU 2006), Special Session on Fuzzy Sets in Probability and Statistics, pp. 731–738 (2006)

    Google Scholar 

  12. Litvak, M., Vanetik, N., Last, M.: Krimping texts for better summarization. In: Conference on Empirical Methods in Natural Language Processing, pp. 1931–1935 (2015)

    Google Scholar 

  13. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  14. Wang, D., Zhu, S., Li, T., Gong, Y.: Multi-document summarization using sentence-based topic models. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 297–300. Association for Computational Linguistics (2009)

    Google Scholar 

  15. Lee, S., Belkasim, S., Zhang, Y.: Multi-document text summarization using topic model and fuzzy logic. In: Perner, P. (ed.) MLDM 2013. LNCS (LNAI), vol. 7988, pp. 159–168. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39712-7_12

    Chapter  Google Scholar 

  16. Li, L., Heng, W., Yu, J., Liu, Y., Wan, S.: Cist system report for ACL multiling 2013-track 1: multilingual multi-document summarization. MultiLing 2013, 39 (2013)

    Google Scholar 

  17. Takamura, H., Okumura, M.: Text summarization model based on maximum coverage problem and its variant. In: EACL ’09: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 781–789 (2009)

    Google Scholar 

  18. Gillick, D., Favre, B.: A scalable global model for summarization. In: Proceedings of the NAACL HLT Workshop on Integer Linear Programming for Natural Language Processing, pp. 10–18 (2009)

    Google Scholar 

  19. Zhang, H.P., Yu, H.K., Xiong, D.Y., Liu, Q.: HHMM-based Chinese lexical analyzer ICTCLAS. In: Second SIGHAN Workshop Affiliated with 41th ACL, pp. 184–187 (2003)

    Google Scholar 

  20. Wei, H., Jia, Y., Lei, L., Yongbin, L.: Research on key factors in multi-document topic modeling application with HLDA. J. Chin. Inf. Process. 27(6), 117–127 (2013)

    Google Scholar 

  21. Mitchell, T.M.: Machine Learning, 1st edn. McGraw-Hill Inc., New York, NY, USA (1997)

    MATH  Google Scholar 

  22. Vreeken, J., Leeuwen, M., Siebes, A.: Krimp: mining itemsets that compress. Data Min. Knowl. Discov. 23, 169–214 (2011)

    Article  MathSciNet  Google Scholar 

  23. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: 20th International Conference on Very Large Databases, pp. 487–499 (1994)

    Google Scholar 

  24. Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), pp. 25–26 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Natalia Vanetik .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Litvak, M., Vanetik, N., Li, L. (2018). Summarizing Weibo with Topics Compression. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science(), vol 10762. Springer, Cham. https://doi.org/10.1007/978-3-319-77116-8_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-77116-8_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-77115-1

  • Online ISBN: 978-3-319-77116-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics