Zero-cost Transition to Multi-document Processing in Summarization with Multi-Channel Attention

Nguyen, Minh-Quang; Can, Duy-Cat; Le, Hoang-Quynh

doi:10.1007/978-3-031-70362-1_24

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14945))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

734 Accesses

Abstract

Fine-tuning transformer models for multi-document summarization is a widely applied approach due to their ability to capture complex relationships across documents. However, full-attention transformer models often struggle with the long-sequence problem, where the computational complexity grows quadratically with the sequence length. Additionally, the optimization cost of transformer is also exceedingly high. To address these challenges, we propose a novel vertical scaling approach. In this approach, we conditionally factorize the multi-document output probability by lower-complexity components. Specifically, these components are estimated by estimators optimized for single-doc data. Unlike the full-attention approach, vertical scaling has a complexity that scales linearly with the number of single documents, making it more efficient for long documents or large numbers of documents. To further enhance the efficiency and effectiveness of our approach, we introduced the Multi-Channel Attention architecture. This architecture enables us to fully utilize BART’s single-doc pre-optimized parameters, while does not require re-optimization, leading to a zero-cost transition. Our approach maintains promising accuracy and computing efficiency. We publish our implementation and related data at https://github.com/nbtpj/MCA.

M.-Q. Nguyen and D.-C. Can—Shared first authors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

LongSum: An Efficient Transformer for Long Document Summarization

Learning Interactions at Multiple Levels for Abstractive Multi-document Summarization

A Survey of Text Summarization Approaches Based on Deep Learning

Article 31 May 2021

Notes

1.
We are referring to the exact component $v_Y$ calculation, not the generated output during model inference.

References

Beltagy, I., Peters, M.E., Cohan, A.: Longformer: the long-document transformer. arXiv preprint arXiv:2004.05150 (2020)
Ben Abacha, A., Mrabet, Y., Zhang, Y., Shivade, C., Langlotz, C., Demner-Fushman, D.: Overview of the mediqa 2021 shared task on summarization in the medical domain. In: Proceedings of the 20th SIGBioMed Workshop on Biomedical Language Processing, NAACL-BioNLP 2021. Association for Computational Linguistics (2021)
Google Scholar
Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)
Google Scholar
Chen, J., Yang, D.: Multi-view sequence-to-sequence models with conversational structure for abstractive dialogue summarization. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 4106–4118 (2020)
Google Scholar
Dai, Z., Yang, Z., Yang, Y., Carbonell, J.G., Le, Q., Salakhutdinov, R.: Transformer-xl: attentive language models beyond a fixed-length context. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2978–2988 (2019)
Google Scholar
Dettmers, T., Pagnoni, A., Holtzman, A., Zettlemoyer, L.: Qlora: efficient finetuning of quantized LLMS. Adv. Neural Inf. Process. Syst. 36 (2024)
Google Scholar
DeYoung, J., Beltagy, I., van Zuylen, M., Kuehl, B., Wang, L.L.: Ms2: multi-document summarization of medical studies. arXiv preprint arXiv:2104.06486 (2021)
Hermann, K.M., et al.: Teaching machines to read and comprehend. Adv. Neural Inf. Process. Syst. 28 (2015)
Google Scholar
Hu, E.J., et al.: Lora: low-rank adaptation of large language models. In: International Conference on Learning Representations (2021)
Google Scholar
Jin, Q., Dhingra, B., Liu, Z., Cohen, W., Lu, X.: Pubmedqa: a dataset for biomedical research question answering. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2567–2577 (2019)
Google Scholar
Le, H.Q., et al.: UETfishes at MEDIQA 2021: standing-on-the-shoulders-of-giants model for abstractive multi-answer summarization. In: Proceedings of the 20th Workshop on Biomedical Language Processing, pp. 328–335. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.bionlp-1.38
Lewis, M., et al.: Bart: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7871–7880 (2020)
Google Scholar
Lin, C.Y., Och, F.: Looking for a few good metrics: rand its evaluation. In: NTCIR Workshop (2004)
Google Scholar
Longpre, S., et al.: The flan collection: designing data and methods for effective instruction tuning. In: International Conference on Machine Learning, pp. 22631–22648. PMLR (2023)
Google Scholar
Mrini, K., et al.: UCSD-adobe at mediqa 2021: transfer learning and answer sentence selection for medical summarization. In: Proceedings of the 20th Workshop on Biomedical Language Processing, pp. 257–262 (2021)
Google Scholar
Over, P., Yen, J.: An Introduction to Duc-2004. National Institute of Standards and Technology (2004)
Google Scholar
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2383–2392 (2016)
Google Scholar
Savery, M., Abacha, A.B., Gayen, S., Demner-Fushman, D.: Question-driven summarization of answers to consumer health questions. Sci. Data 7(1), 1–9 (2020)
Article Google Scholar
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Zaheer, M., et al.: Big bird: transformers for longer sequences. Adv. Neural. Inf. Process. Syst. 33, 17283–17297 (2020)
Google Scholar
Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y.: Bertscore: evaluating text generation with Bert. In: International Conference on Learning Representations (2019)
Google Scholar
Zhu, W., et al.: paht_nlp@ mediqa 2021: Multi-grained query focused multi-answer summarization. In: Proceedings of the 20th Workshop on Biomedical Language Processing, pp. 96–102 (2021)
Google Scholar

Download references

Acknowledgement

This research has been done under the research project QG.22.61 “Research and Development of Vietnamese Multi-document Summarization Based on Advanced Language Models” of Vietnam National University, Hanoi. Quang would like to thank Professor Hady W. Lauw, School of Computing and Information Systems, Singapore Management University, for his instructive comments during finishing this manuscript. He also wants to thank all other members of DS&KT Laboratory, University of Engineering and Technology, VNU Hanoi, for their supports.

Author information

Authors and Affiliations

University of Engineering and Technology, VNU Hanoi, Hanoi, Vietnam
Minh-Quang Nguyen, Duy-Cat Can & Hoang-Quynh Le

Authors

Minh-Quang Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Duy-Cat Can
View author publications
You can also search for this author in PubMed Google Scholar
Hoang-Quynh Le
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hoang-Quynh Le .

Editor information

Editors and Affiliations

LTCI, Télécom Paris, Palaiseau Cedex, France
Albert Bifet
KU Leuven, Leuven, Belgium
Jesse Davis
Faculty of Informatics, Vytautas Magnus University, Akademija, Lithuania
Tomas Krilavičius
Institute of Computer Science, University of Tartu, Tartu, Estonia
Meelis Kull
Department of Computer Science, Bundeswehr University Munich, Munich, Germany
Eirini Ntoutsi
Department of Computer Science, University of Helsinki, Helsinki, Finland
Indrė Žliobaitė

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, MQ., Can, DC., Le, HQ. (2024). Zero-cost Transition to Multi-document Processing in Summarization with Multi-Channel Attention. In: Bifet, A., Davis, J., Krilavičius, T., Kull, M., Ntoutsi, E., Žliobaitė, I. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14945. Springer, Cham. https://doi.org/10.1007/978-3-031-70362-1_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-70362-1_24
Published: 22 August 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70361-4
Online ISBN: 978-3-031-70362-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Zero-cost Transition to Multi-document Processing in Summarization with Multi-Channel Attention