Skip to main content

Using Topic Labels for Text Summarization

  • Conference paper
  • First Online:
Book cover Advances in Artificial Intelligence: From Theory to Practice (IEA/AIE 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10351))

  • 1942 Accesses

Abstract

Multi-document summarization is a difficult natural language processing task. Many extractive summarization methods consist of two steps: extract important concepts of documents and select sentences based on those concepts. In this paper, we introduce a method to use the Latent Dirichlet Allocation (LDA) topic labels as concepts, instead of n-gram or using external resources. Sentences are selected based on these topic labels in order to form a summary. Two selection methods are proposed in the paper. Experiments on DUC2004 dataset has shown that Vector-based methods are better, i.e. map topic labels and sentences to a word vector and a letter trigram vector space to find those sentences which are syntactically and semantically related with the topic labels in order to form a summary. Experiments show that the produced summaries are informative, abstractive and better than the baseline method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://opennlp.apache.org/.

References

  • Nenkova, A., McKeown, K.: Automatic summarization. Found. Trend Inform. Retrieval 5(2–3), 103–233 (2011)

    Article  Google Scholar 

  • Li, C., Qian, X., Liu, Y.: Using supervised bigram-based ILP for extractive summarization. In: Proceedings of ACL, Sofia, Bulgaria (2013)

    Google Scholar 

  • Erkan, G., Radev, D.R.: Lexrank: Graph-based centrality as salience in text summarization. Jair, 2004, 22 (2004)

    Google Scholar 

  • Galanis, D., Lampouras, G., Androutsopoulos, I.: Extractive multi-document summarization with integer linear programming and support vector regression. In: Proceedings of the COLING (2012)

    Google Scholar 

  • Gillick, D., Favre, B.: A scalable global model for summarization. In: Proceedings of ACL Workshop on Integer Linear Programming for Natural Language Processing (2009)

    Google Scholar 

  • Kou, W., Li, F., Baldwin, T.: Automatic labelling of topic models using word vectors and letter trigram vectors. In: Zuccon, G., Geva, S., Joho, H., Scholer, F., Sun, A., Zhang, P. (eds.) AIRS 2015. LNCS, vol. 9460, pp. 253–264. Springer, Cham (2015). doi:10.1007/978-3-319-28940-3_20

    Chapter  Google Scholar 

  • Carenini, G., Cheung, J.C.K., Pauls, A.: Multi-document summarization of evaluative text. Comput. Intell. 29(4), 545–576 (2006)

    Article  MathSciNet  Google Scholar 

  • Berg-Kirkpatrick, T., Gillick, D., Klein, D.: Jointly learning to extract and compress. In: Proceedings of ACL, Portland, USA (2011)

    Google Scholar 

  • Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  • Lau, J.H., Newman, D., Karimi, S., et al.: Best topic word selection for topic labelling. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters. Association for Computational Linguistics, pp. 605–613 (2010)

    Google Scholar 

  • Blei, D.M., Lafferty, J.D.: Visualizing topics with multi-word expressions (2009). arXiv preprint arXiv:0907.1013

  • Cano Basave, A.E., He, Y., Xu, R.: Automatic labelling of topic models learned from Twitter by summarisation. Association for Computational Linguistics (ACL) (2014)

    Google Scholar 

  • Huang, P.S., He, X., Gao, J., et al.: Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 2333–2338. ACM (2013)

    Google Scholar 

  • Mikolov, T., Sutskever, I., Chen, K., et al.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  • Mitchell, J., Lapata, M.: Composition in distributional models of semantics. Cogn. Sci. 34(8), 1388–1429 (2010)

    Article  Google Scholar 

  • Tsvetkov, Y., Faruqu, M., et al.: Evaluation of word vector representations by subspace alignment. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015, pp. 2049–2054 (2015)

    Google Scholar 

  • Zheng, H.-T., Gong, S.-Q., Guo, J.-M., Wu, W.-Z.: Exploiting conceptual relations of sentences for multi-document summarization. In: Dong, X.L., Yu, X., Li, J., Sun, Y. (eds.) WAIM 2015. LNCS, vol. 9098, pp. 506–510. Springer, Cham (2015). doi:10.1007/978-3-319-21042-1_51

    Chapter  Google Scholar 

Download references

Acknowledgement

We would like to thank the National Natural Science Foundation of China (Grant No. 61375053) for part of the financial support of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fang Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Kou, W., Li, F., Ye, Z. (2017). Using Topic Labels for Text Summarization. In: Benferhat, S., Tabia, K., Ali, M. (eds) Advances in Artificial Intelligence: From Theory to Practice. IEA/AIE 2017. Lecture Notes in Computer Science(), vol 10351. Springer, Cham. https://doi.org/10.1007/978-3-319-60045-1_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-60045-1_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-60044-4

  • Online ISBN: 978-3-319-60045-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics