Skip to main content

Zero-cost Transition to Multi-document Processing in Summarization with Multi-Channel Attention

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases. Research Track (ECML PKDD 2024)

Abstract

Fine-tuning transformer models for multi-document summarization is a widely applied approach due to their ability to capture complex relationships across documents. However, full-attention transformer models often struggle with the long-sequence problem, where the computational complexity grows quadratically with the sequence length. Additionally, the optimization cost of transformer is also exceedingly high. To address these challenges, we propose a novel vertical scaling approach. In this approach, we conditionally factorize the multi-document output probability by lower-complexity components. Specifically, these components are estimated by estimators optimized for single-doc data. Unlike the full-attention approach, vertical scaling has a complexity that scales linearly with the number of single documents, making it more efficient for long documents or large numbers of documents. To further enhance the efficiency and effectiveness of our approach, we introduced the Multi-Channel Attention architecture. This architecture enables us to fully utilize BART’s single-doc pre-optimized parameters, while does not require re-optimization, leading to a zero-cost transition. Our approach maintains promising accuracy and computing efficiency. We publish our implementation and related data at https://github.com/nbtpj/MCA.

M.-Q. Nguyen and D.-C. Can—Shared first authors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    We are referring to the exact component \(v_Y\) calculation, not the generated output during model inference.

References

  1. Beltagy, I., Peters, M.E., Cohan, A.: Longformer: the long-document transformer. arXiv preprint arXiv:2004.05150 (2020)

  2. Ben Abacha, A., Mrabet, Y., Zhang, Y., Shivade, C., Langlotz, C., Demner-Fushman, D.: Overview of the mediqa 2021 shared task on summarization in the medical domain. In: Proceedings of the 20th SIGBioMed Workshop on Biomedical Language Processing, NAACL-BioNLP 2021. Association for Computational Linguistics (2021)

    Google Scholar 

  3. Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)

    Google Scholar 

  4. Chen, J., Yang, D.: Multi-view sequence-to-sequence models with conversational structure for abstractive dialogue summarization. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 4106–4118 (2020)

    Google Scholar 

  5. Dai, Z., Yang, Z., Yang, Y., Carbonell, J.G., Le, Q., Salakhutdinov, R.: Transformer-xl: attentive language models beyond a fixed-length context. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2978–2988 (2019)

    Google Scholar 

  6. Dettmers, T., Pagnoni, A., Holtzman, A., Zettlemoyer, L.: Qlora: efficient finetuning of quantized LLMS. Adv. Neural Inf. Process. Syst. 36 (2024)

    Google Scholar 

  7. DeYoung, J., Beltagy, I., van Zuylen, M., Kuehl, B., Wang, L.L.: Ms2: multi-document summarization of medical studies. arXiv preprint arXiv:2104.06486 (2021)

  8. Hermann, K.M., et al.: Teaching machines to read and comprehend. Adv. Neural Inf. Process. Syst. 28 (2015)

    Google Scholar 

  9. Hu, E.J., et al.: Lora: low-rank adaptation of large language models. In: International Conference on Learning Representations (2021)

    Google Scholar 

  10. Jin, Q., Dhingra, B., Liu, Z., Cohen, W., Lu, X.: Pubmedqa: a dataset for biomedical research question answering. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2567–2577 (2019)

    Google Scholar 

  11. Le, H.Q., et al.: UETfishes at MEDIQA 2021: standing-on-the-shoulders-of-giants model for abstractive multi-answer summarization. In: Proceedings of the 20th Workshop on Biomedical Language Processing, pp. 328–335. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.bionlp-1.38

  12. Lewis, M., et al.: Bart: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7871–7880 (2020)

    Google Scholar 

  13. Lin, C.Y., Och, F.: Looking for a few good metrics: rand its evaluation. In: NTCIR Workshop (2004)

    Google Scholar 

  14. Longpre, S., et al.: The flan collection: designing data and methods for effective instruction tuning. In: International Conference on Machine Learning, pp. 22631–22648. PMLR (2023)

    Google Scholar 

  15. Mrini, K., et al.: UCSD-adobe at mediqa 2021: transfer learning and answer sentence selection for medical summarization. In: Proceedings of the 20th Workshop on Biomedical Language Processing, pp. 257–262 (2021)

    Google Scholar 

  16. Over, P., Yen, J.: An Introduction to Duc-2004. National Institute of Standards and Technology (2004)

    Google Scholar 

  17. Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2383–2392 (2016)

    Google Scholar 

  18. Savery, M., Abacha, A.B., Gayen, S., Demner-Fushman, D.: Question-driven summarization of answers to consumer health questions. Sci. Data 7(1), 1–9 (2020)

    Article  Google Scholar 

  19. Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)

    Google Scholar 

  20. Zaheer, M., et al.: Big bird: transformers for longer sequences. Adv. Neural. Inf. Process. Syst. 33, 17283–17297 (2020)

    Google Scholar 

  21. Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y.: Bertscore: evaluating text generation with Bert. In: International Conference on Learning Representations (2019)

    Google Scholar 

  22. Zhu, W., et al.: paht_nlp@ mediqa 2021: Multi-grained query focused multi-answer summarization. In: Proceedings of the 20th Workshop on Biomedical Language Processing, pp. 96–102 (2021)

    Google Scholar 

Download references

Acknowledgement

This research has been done under the research project QG.22.61 “Research and Development of Vietnamese Multi-document Summarization Based on Advanced Language Models” of Vietnam National University, Hanoi. Quang would like to thank Professor Hady W. Lauw, School of Computing and Information Systems, Singapore Management University, for his instructive comments during finishing this manuscript. He also wants to thank all other members of DS&KT Laboratory, University of Engineering and Technology, VNU Hanoi, for their supports.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hoang-Quynh Le .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nguyen, MQ., Can, DC., Le, HQ. (2024). Zero-cost Transition to Multi-document Processing in Summarization with Multi-Channel Attention. In: Bifet, A., Davis, J., Krilavičius, T., Kull, M., Ntoutsi, E., Žliobaitė, I. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14945. Springer, Cham. https://doi.org/10.1007/978-3-031-70362-1_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70362-1_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70361-4

  • Online ISBN: 978-3-031-70362-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics