Using Topic Labels for Text Summarization

Kou, Wanqiu; Li, Fang; Ye, Zhe

doi:10.1007/978-3-319-60045-1_46

Wanqiu Kou¹⁶,
Fang Li¹⁶ &
Zhe Ye¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10351))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

1942 Accesses

Abstract

Multi-document summarization is a difficult natural language processing task. Many extractive summarization methods consist of two steps: extract important concepts of documents and select sentences based on those concepts. In this paper, we introduce a method to use the Latent Dirichlet Allocation (LDA) topic labels as concepts, instead of n-gram or using external resources. Sentences are selected based on these topic labels in order to form a summary. Two selection methods are proposed in the paper. Experiments on DUC2004 dataset has shown that Vector-based methods are better, i.e. map topic labels and sentences to a word vector and a letter trigram vector space to find those sentences which are syntactically and semantically related with the topic labels in order to form a summary. Experiments show that the produced summaries are informative, abstractive and better than the baseline method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://opennlp.apache.org/.

References

Nenkova, A., McKeown, K.: Automatic summarization. Found. Trend Inform. Retrieval 5(2–3), 103–233 (2011)
Article Google Scholar
Li, C., Qian, X., Liu, Y.: Using supervised bigram-based ILP for extractive summarization. In: Proceedings of ACL, Sofia, Bulgaria (2013)
Google Scholar
Erkan, G., Radev, D.R.: Lexrank: Graph-based centrality as salience in text summarization. Jair, 2004, 22 (2004)
Google Scholar
Galanis, D., Lampouras, G., Androutsopoulos, I.: Extractive multi-document summarization with integer linear programming and support vector regression. In: Proceedings of the COLING (2012)
Google Scholar
Gillick, D., Favre, B.: A scalable global model for summarization. In: Proceedings of ACL Workshop on Integer Linear Programming for Natural Language Processing (2009)
Google Scholar
Kou, W., Li, F., Baldwin, T.: Automatic labelling of topic models using word vectors and letter trigram vectors. In: Zuccon, G., Geva, S., Joho, H., Scholer, F., Sun, A., Zhang, P. (eds.) AIRS 2015. LNCS, vol. 9460, pp. 253–264. Springer, Cham (2015). doi:10.1007/978-3-319-28940-3_20
Chapter Google Scholar
Carenini, G., Cheung, J.C.K., Pauls, A.: Multi-document summarization of evaluative text. Comput. Intell. 29(4), 545–576 (2006)
Article MathSciNet Google Scholar
Berg-Kirkpatrick, T., Gillick, D., Klein, D.: Jointly learning to extract and compress. In: Proceedings of ACL, Portland, USA (2011)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Lau, J.H., Newman, D., Karimi, S., et al.: Best topic word selection for topic labelling. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters. Association for Computational Linguistics, pp. 605–613 (2010)
Google Scholar
Blei, D.M., Lafferty, J.D.: Visualizing topics with multi-word expressions (2009). arXiv preprint arXiv:0907.1013
Cano Basave, A.E., He, Y., Xu, R.: Automatic labelling of topic models learned from Twitter by summarisation. Association for Computational Linguistics (ACL) (2014)
Google Scholar
Huang, P.S., He, X., Gao, J., et al.: Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 2333–2338. ACM (2013)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., et al.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Mitchell, J., Lapata, M.: Composition in distributional models of semantics. Cogn. Sci. 34(8), 1388–1429 (2010)
Article Google Scholar
Tsvetkov, Y., Faruqu, M., et al.: Evaluation of word vector representations by subspace alignment. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015, pp. 2049–2054 (2015)
Google Scholar
Zheng, H.-T., Gong, S.-Q., Guo, J.-M., Wu, W.-Z.: Exploiting conceptual relations of sentences for multi-document summarization. In: Dong, X.L., Yu, X., Li, J., Sun, Y. (eds.) WAIM 2015. LNCS, vol. 9098, pp. 506–510. Springer, Cham (2015). doi:10.1007/978-3-319-21042-1_51
Chapter Google Scholar

Download references

Acknowledgement

We would like to thank the National Natural Science Foundation of China (Grant No. 61375053) for part of the financial support of this paper.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Shanghai Jiaotong University, Shanghai, 200240, People’s Republic of China
Wanqiu Kou, Fang Li & Zhe Ye

Authors

Wanqiu Kou
View author publications
You can also search for this author in PubMed Google Scholar
Fang Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Ye
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fang Li .

Editor information

Editors and Affiliations

Artois University, Lens, France
Salem Benferhat
Artois University, Lens, France
Karim Tabia
Texas State University, San Marcos, Texas, USA
Moonis Ali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kou, W., Li, F., Ye, Z. (2017). Using Topic Labels for Text Summarization. In: Benferhat, S., Tabia, K., Ali, M. (eds) Advances in Artificial Intelligence: From Theory to Practice. IEA/AIE 2017. Lecture Notes in Computer Science(), vol 10351. Springer, Cham. https://doi.org/10.1007/978-3-319-60045-1_46

Download citation

DOI: https://doi.org/10.1007/978-3-319-60045-1_46
Published: 03 June 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60044-4
Online ISBN: 978-3-319-60045-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics