Skip to main content

Semantic WordRank: Generating Finer Single-Document Summarizations

  • Conference paper
  • First Online:
Intelligent Data Engineering and Automated Learning – IDEAL 2018 (IDEAL 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11314))

  • 2468 Accesses

Abstract

We present Semantic WordRank (SWR), an unsupervised method for generating an extractive summary of a single document. Built on a weighted word graph with semantic and co-occurrence edges, SWR scores sentences using an article-structure-biased PageRank algorithm with a Softplus function adjustment, and promotes topic diversity using spectral subtopic clustering under the Word-Movers-Distance metric. We evaluate SWR on the DUC-02 and SummBank datasets and show that SWR produces better summaries than the state-of-the-art algorithms over DUC-02 under common ROUGE measures. We then show that, under the same measures over SummBank, SWR outperforms each of the three human annotators (aka. judges) and compares favorably with the combined performance of all judges.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Atasu, K., et al.: Linear-complexity relaxed word mover’s distance with GPU acceleration. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 889–896. IEEE (2017)

    Google Scholar 

  2. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016)

  3. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998)

    Article  Google Scholar 

  4. Cao, Z., Wei, F., Dong, L., Li, S., Zhou, M.: Ranking with recursive neural networks and its application to multi-document summarization. In: AAAI, pp. 2153–2159 (2015)

    Google Scholar 

  5. DUC: Document understanding conference 2002 (2002)

    Google Scholar 

  6. Erkan, G., Radev, D.R.: LexRank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)

    Article  Google Scholar 

  7. Florescu, C., Caragea, C.: A position-biased pagerank algorithm for keyphrase extraction. In: AAAI, pp. 4923–4924 (2017)

    Google Scholar 

  8. Gong, Y., Liu, X.: Generic text summarization using relevance measure and latent semantic analysis. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 19–25. ACM (2001)

    Google Scholar 

  9. Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)

  10. Kiros, R., et al.: Skip-thought vectors. In: Advances in Neural Information Processing Systems, pp. 3294–3302 (2015)

    Google Scholar 

  11. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. In: Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. Citeseer (1998)

    Google Scholar 

  12. Kusner, M., Sun, Y., Kolkin, N., Weinberger, K.: From word embeddings to document distances. In: International Conference on Machine Learning, pp. 957–966 (2015)

    Google Scholar 

  13. Lin, H., Bilmes, J.: A class of submodular functions for document summarization. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pp. 510–520. Association for Computational Linguistics (2011)

    Google Scholar 

  14. Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (2004)

    Google Scholar 

  15. Mikolov, T., Yih, W.T., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 746–751 (2013)

    Google Scholar 

  16. Nallapati, R., Zhai, F., Zhou, B.: Summarunner: a recurrent neural network based sequence model for extractive summarization of documents. In: AAAI, pp. 3075–3081 (2017)

    Google Scholar 

  17. Parveen, D., Mesgar, M., Strube, M.: Generating coherent summaries of scientific articles using coherence patterns. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 772–783 (2016)

    Google Scholar 

  18. Parveen, D., Ramsl, H.M., Strube, M.: Topical coherence for graph-based extractive summarization. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1949–1954 (2015)

    Google Scholar 

  19. Parveen, D., Strube, M.: Integrating importance, non-redundancy and coherence in graph-based extractive summarization. In: IJCAI, pp. 1298–1304 (2015)

    Google Scholar 

  20. Radev, D., et al.: Summbank 1.0 ldc2003t16. web download. Linguistic Data Consortium, Philadelphia (2003)

    Google Scholar 

  21. Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)

    Article  MathSciNet  Google Scholar 

  22. Wan, X.: Towards a unified approach to simultaneous single-document and multi-document summarizations. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 1137–1145. Association for Computational Linguistics (2010)

    Google Scholar 

  23. Wan, X., Xiao, J.: Exploiting neighborhood knowledge for single document summarization and keyphrase extraction. ACM Trans. Inf. Syst. (TOIS) 28(2), 8 (2010)

    Article  Google Scholar 

  24. Wong, K.F., Wu, M., Li, W.: Extractive summarization using supervised and semi-supervised learning. In: Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1, pp. 985–992. Association for Computational Linguistics (2008)

    Google Scholar 

  25. Zhang, Y., Er, M.J., Pratama, M.: Extractive document summarization based on convolutional neural networks. In: IECON 2016–42nd Annual Conference of the IEEE Industrial Electronics Society, pp. 918–922. IEEE (2016)

    Google Scholar 

Download references

Acknowledgments

We thank Liqun Shao for interesting conversations on using the Softplus function adjustment.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hao Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, H., Wang, J. (2018). Semantic WordRank: Generating Finer Single-Document Summarizations. In: Yin, H., Camacho, D., Novais, P., Tallón-Ballesteros, A. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2018. IDEAL 2018. Lecture Notes in Computer Science(), vol 11314. Springer, Cham. https://doi.org/10.1007/978-3-030-03493-1_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-03493-1_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-03492-4

  • Online ISBN: 978-3-030-03493-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics