Abstract
Extractive text summarization aims at selecting a small subset of sentences so that the contents and meaning of the original document are best preserved. In this paper we describe an unsupervised approach to extractive summarization. It combines hierarchical topic modeling (TM) with the Minimal Description Length (MDL) principle and applies them to Chinese language. Our summarizer strives to extract information that provides the best description of text topics in terms of MDL. This model is applied to the NLPCC 2015 Shared Task of Weibo-Oriented Chinese News Summarization [1], where Chinese texts from news articles were summarized with the goal of creating short meaningful messages for Weibo (Sina Weibo is a Chinese microblogging website, one of the most popular sites in China.) [2]. The experimental results disclose superiority of our approach over other summarizers from the NLPCC 2015 competition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
Because our system did not participate in the NLPCC competition, all experiments were re-run by ourselves.
- 3.
ranked first by Rouge, F-measure.
References
Wan, X., Zhang, J., Wen, S., Tan, J.: Overview of the NLPCC 2015 shared task: weibo-oriented chinese news summarization. In: Li, J., Ji, H., Zhao, D., Feng, Y. (eds.) NLPCC 2015. LNCS (LNAI), vol. 9362, pp. 557–561. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25207-0_52
SINA, C.: Sina weibo (2009). www.weibo.com
Lakshmanan, L.V.S., Ng, R.T., Wang, C.X., Zhou, X., Johnson, T.J.: The generalized mdl approach for summarization. In: Proceedings of the 28th International Conference on Very Large Data Bases, VLDB 2002, pp. 766–777 (2002)
Bu, S., Lakshmanan, L.V.S., Ng, R.T.: Mdl summarization with holes. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB 2005 pp. 433–444 (2005)
Nomoto, T., Matsumoto, Y.: A new approach to unsupervised text summarization. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR ’01, pp. 26–34 (2001)
Nomoto, T.: Machine learning approaches to rhetorical parsing and open-domain text summarization. Ph.D. thesis, Nara Institute of Science and Technology (2004)
Nguyen, T.S., Lauw, H.W., Tsaparas, P.: Review synthesis for micro-review summarization. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, WSDM 2015, pp. 169–178 (2015)
Baralis, E., Cagliero, L., Jabeen, S., Fiori, A.: Multi-document summarization exploiting frequent itemsets. In: Proceedings of the 27th Annual ACM Symposium on Applied Computing, SAC 2012 pp. 782–786 (2012)
Agarwal, N., Gvr, K., Reddy, R.S., Ros, C.P.: Scisumm: a multi-document summarization system for scientific articles. In: Proceedings of the ACL-HLT 2011 System Demonstrations, pp. 115–120 (2011)
Dalal, M.K., Zaveri, M.A.: Semisupervised learning based opinion summarization and classification for online product reviews. Appl. Comput. Intell. Soft Comput. 2013,10 (2013)
Danon, G., Schneider, M., Last, M., Litvak, M., Kandel, A.: An apriori-like algorithm for extracting fuzzy association rules between keyphrases in text documents. In: Proceedings of the 11th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU 2006), Special Session on Fuzzy Sets in Probability and Statistics, pp. 731–738 (2006)
Litvak, M., Vanetik, N., Last, M.: Krimping texts for better summarization. In: Conference on Empirical Methods in Natural Language Processing, pp. 1931–1935 (2015)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Wang, D., Zhu, S., Li, T., Gong, Y.: Multi-document summarization using sentence-based topic models. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 297–300. Association for Computational Linguistics (2009)
Lee, S., Belkasim, S., Zhang, Y.: Multi-document text summarization using topic model and fuzzy logic. In: Perner, P. (ed.) MLDM 2013. LNCS (LNAI), vol. 7988, pp. 159–168. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39712-7_12
Li, L., Heng, W., Yu, J., Liu, Y., Wan, S.: Cist system report for ACL multiling 2013-track 1: multilingual multi-document summarization. MultiLing 2013, 39 (2013)
Takamura, H., Okumura, M.: Text summarization model based on maximum coverage problem and its variant. In: EACL ’09: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 781–789 (2009)
Gillick, D., Favre, B.: A scalable global model for summarization. In: Proceedings of the NAACL HLT Workshop on Integer Linear Programming for Natural Language Processing, pp. 10–18 (2009)
Zhang, H.P., Yu, H.K., Xiong, D.Y., Liu, Q.: HHMM-based Chinese lexical analyzer ICTCLAS. In: Second SIGHAN Workshop Affiliated with 41th ACL, pp. 184–187 (2003)
Wei, H., Jia, Y., Lei, L., Yongbin, L.: Research on key factors in multi-document topic modeling application with HLDA. J. Chin. Inf. Process. 27(6), 117–127 (2013)
Mitchell, T.M.: Machine Learning, 1st edn. McGraw-Hill Inc., New York, NY, USA (1997)
Vreeken, J., Leeuwen, M., Siebes, A.: Krimp: mining itemsets that compress. Data Min. Knowl. Discov. 23, 169–214 (2011)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: 20th International Conference on Very Large Databases, pp. 487–499 (1994)
Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), pp. 25–26 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Litvak, M., Vanetik, N., Li, L. (2018). Summarizing Weibo with Topics Compression. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science(), vol 10762. Springer, Cham. https://doi.org/10.1007/978-3-319-77116-8_39
Download citation
DOI: https://doi.org/10.1007/978-3-319-77116-8_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77115-1
Online ISBN: 978-3-319-77116-8
eBook Packages: Computer ScienceComputer Science (R0)