skip to main content
10.1145/3639233.3639348acmotherconferencesArticle/Chapter ViewAbstractPublication PagesnlpirConference Proceedingsconference-collections
research-article

Leveraging Salience Analysis and Sparse Attention for Long Document Summarization

Published:05 March 2024Publication History

ABSTRACT

Extractive and abstractive summarization models have led to promising results in summarizing relatively short documents, but still face the challenge from longer-form documents (e.g., scientific papers). Specifically, extractive models produce inaccurate or redundant summaries due to their weak salience analysis, while transformer-based abstractive models suffer from the quadratic dependency on the sequence length for their full attention mechanism. To remedy this, we propose a novel hybrid model named LDSumm (Long Document Summarization), which is composed of an extractive module that enhances the salience analysis by leveraging hierarchical structure (especially section information) of a document, and an abstractive module that introduces sparse attention ideas to increase the input size of BART. We conduct extensive experiments on two scientific-paper datasets: arXiv and PubMed. Experimental results show that LDSumm outperforms the baseline BART and other comparison models and obtains greater gain on the longer-paper dataset arXiv.

References

  1. Iz Beltagy, Matthew E. Peters, and Arman Cohan. 2020. Longformer: The Long-Document Transformer. CoRR abs/2004.05150 (2020).Google ScholarGoogle Scholar
  2. Rewon Child, Scott Gray, Alec Radford, and Ilya Sutskever. 2019. Generating Long Sequences with Sparse Transformers. CoRR abs/1904.10509 (2019).Google ScholarGoogle Scholar
  3. Arman Cohan, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Seokhwan Kim, Walter Chang, and Nazli Goharian. 2018. A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents. In NAACL-HLT (2). Association for Computational Linguistics, 615–621.Google ScholarGoogle Scholar
  4. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171–4186.Google ScholarGoogle Scholar
  5. Günes Erkan and Dragomir R. Radev. 2011. LexRank: Graph-based Lexical Centrality as Salience in Text Summarization. CoRR abs/1109.2128 (2011).Google ScholarGoogle Scholar
  6. Alexios Gidiotis and Grigorios Tsoumakas. 2020. A Divide-and-Conquer Approach to the Summarization of Long Documents. IEEE ACM Trans. Audio Speech Lang. Process. 28 (2020), 3029–3040.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguistics, 7871–7880.Google ScholarGoogle ScholarCross RefCross Ref
  8. Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain.Google ScholarGoogle Scholar
  9. Chia-Wei Liu, Ryan Lowe, Iulian Serban, Michael Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016. 2122–2132.Google ScholarGoogle ScholarCross RefCross Ref
  10. Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. 2017. SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents. In AAAI. AAAI Press, 3075–3081.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Michal Pietruszka, Lukasz Borchmann, and Lukasz Garncarek. 2022. Sparsifying Transformer Models with Trainable Representation Pooling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, 8616–8633.Google ScholarGoogle ScholarCross RefCross Ref
  12. Jonathan Pilault, Raymond Li, Sandeep Subramanian, and Chris Pal. 2020. On Extractive and Abstractive Neural Document Summarization with Transformer Language Models. In EMNLP (1). Association for Computational Linguistics, 9308–9319.Google ScholarGoogle Scholar
  13. Alec Radford and Karthik Narasimhan. 2018. Improving Language Understanding by Generative Pre-Training.Google ScholarGoogle Scholar
  14. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, 3980–3990.Google ScholarGoogle ScholarCross RefCross Ref
  15. Tobias Rohde, Xiaoxia Wu, and Yinhan Liu. 2021. Hierarchical Learning for Generation with Long Source Sequences. CoRR abs/2104.07545 (2021).Google ScholarGoogle Scholar
  16. Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get To The Point: Summarization with Pointer-Generator Networks. In ACL (1). Association for Computational Linguistics, 1073–1083.Google ScholarGoogle Scholar
  17. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5998–6008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Wen Xiao and Giuseppe Carenini. 2019. Extractive Summarization of Long Documents by Combining Global and Local Context. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, 3009–3019.Google ScholarGoogle ScholarCross RefCross Ref
  19. Wen Xiao and Giuseppe Carenini. 2020. Systematically Exploring Redundancy Reduction in Summarizing Long Documents. In AACL/IJCNLP. Association for Computational Linguistics, 516–528.Google ScholarGoogle Scholar
  20. Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontañón, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, and Amr Ahmed. 2020. Big Bird: Transformers for Longer Sequences. In NeurIPS.Google ScholarGoogle Scholar
  21. Jingqing Zhang, Yao Zhao, Mohammad Saleh, and Peter J. Liu. 2020. PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In ICML(Proceedings of Machine Learning Research, Vol. 119). PMLR, 11328–11339.Google ScholarGoogle Scholar
  22. Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. 2020. BERTScore: Evaluating Text Generation with BERT. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.Google ScholarGoogle Scholar

Index Terms

  1. Leveraging Salience Analysis and Sparse Attention for Long Document Summarization

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      NLPIR '23: Proceedings of the 2023 7th International Conference on Natural Language Processing and Information Retrieval
      December 2023
      336 pages
      ISBN:9798400709227
      DOI:10.1145/3639233

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 March 2024

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)6
      • Downloads (Last 6 weeks)3

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format