Abstract
Multi-document summarization is a difficult natural language processing task. Many extractive summarization methods consist of two steps: extract important concepts of documents and select sentences based on those concepts. In this paper, we introduce a method to use the Latent Dirichlet Allocation (LDA) topic labels as concepts, instead of n-gram or using external resources. Sentences are selected based on these topic labels in order to form a summary. Two selection methods are proposed in the paper. Experiments on DUC2004 dataset has shown that Vector-based methods are better, i.e. map topic labels and sentences to a word vector and a letter trigram vector space to find those sentences which are syntactically and semantically related with the topic labels in order to form a summary. Experiments show that the produced summaries are informative, abstractive and better than the baseline method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Nenkova, A., McKeown, K.: Automatic summarization. Found. Trend Inform. Retrieval 5(2–3), 103–233 (2011)
Li, C., Qian, X., Liu, Y.: Using supervised bigram-based ILP for extractive summarization. In: Proceedings of ACL, Sofia, Bulgaria (2013)
Erkan, G., Radev, D.R.: Lexrank: Graph-based centrality as salience in text summarization. Jair, 2004, 22 (2004)
Galanis, D., Lampouras, G., Androutsopoulos, I.: Extractive multi-document summarization with integer linear programming and support vector regression. In: Proceedings of the COLING (2012)
Gillick, D., Favre, B.: A scalable global model for summarization. In: Proceedings of ACL Workshop on Integer Linear Programming for Natural Language Processing (2009)
Kou, W., Li, F., Baldwin, T.: Automatic labelling of topic models using word vectors and letter trigram vectors. In: Zuccon, G., Geva, S., Joho, H., Scholer, F., Sun, A., Zhang, P. (eds.) AIRS 2015. LNCS, vol. 9460, pp. 253–264. Springer, Cham (2015). doi:10.1007/978-3-319-28940-3_20
Carenini, G., Cheung, J.C.K., Pauls, A.: Multi-document summarization of evaluative text. Comput. Intell. 29(4), 545–576 (2006)
Berg-Kirkpatrick, T., Gillick, D., Klein, D.: Jointly learning to extract and compress. In: Proceedings of ACL, Portland, USA (2011)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Lau, J.H., Newman, D., Karimi, S., et al.: Best topic word selection for topic labelling. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters. Association for Computational Linguistics, pp. 605–613 (2010)
Blei, D.M., Lafferty, J.D.: Visualizing topics with multi-word expressions (2009). arXiv preprint arXiv:0907.1013
Cano Basave, A.E., He, Y., Xu, R.: Automatic labelling of topic models learned from Twitter by summarisation. Association for Computational Linguistics (ACL) (2014)
Huang, P.S., He, X., Gao, J., et al.: Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 2333–2338. ACM (2013)
Mikolov, T., Sutskever, I., Chen, K., et al.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Mitchell, J., Lapata, M.: Composition in distributional models of semantics. Cogn. Sci. 34(8), 1388–1429 (2010)
Tsvetkov, Y., Faruqu, M., et al.: Evaluation of word vector representations by subspace alignment. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015, pp. 2049–2054 (2015)
Zheng, H.-T., Gong, S.-Q., Guo, J.-M., Wu, W.-Z.: Exploiting conceptual relations of sentences for multi-document summarization. In: Dong, X.L., Yu, X., Li, J., Sun, Y. (eds.) WAIM 2015. LNCS, vol. 9098, pp. 506–510. Springer, Cham (2015). doi:10.1007/978-3-319-21042-1_51
Acknowledgement
We would like to thank the National Natural Science Foundation of China (Grant No. 61375053) for part of the financial support of this paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Kou, W., Li, F., Ye, Z. (2017). Using Topic Labels for Text Summarization. In: Benferhat, S., Tabia, K., Ali, M. (eds) Advances in Artificial Intelligence: From Theory to Practice. IEA/AIE 2017. Lecture Notes in Computer Science(), vol 10351. Springer, Cham. https://doi.org/10.1007/978-3-319-60045-1_46
Download citation
DOI: https://doi.org/10.1007/978-3-319-60045-1_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60044-4
Online ISBN: 978-3-319-60045-1
eBook Packages: Computer ScienceComputer Science (R0)