Abstract
Fine-tuning transformer models for multi-document summarization is a widely applied approach due to their ability to capture complex relationships across documents. However, full-attention transformer models often struggle with the long-sequence problem, where the computational complexity grows quadratically with the sequence length. Additionally, the optimization cost of transformer is also exceedingly high. To address these challenges, we propose a novel vertical scaling approach. In this approach, we conditionally factorize the multi-document output probability by lower-complexity components. Specifically, these components are estimated by estimators optimized for single-doc data. Unlike the full-attention approach, vertical scaling has a complexity that scales linearly with the number of single documents, making it more efficient for long documents or large numbers of documents. To further enhance the efficiency and effectiveness of our approach, we introduced the Multi-Channel Attention architecture. This architecture enables us to fully utilize BART’s single-doc pre-optimized parameters, while does not require re-optimization, leading to a zero-cost transition. Our approach maintains promising accuracy and computing efficiency. We publish our implementation and related data at https://github.com/nbtpj/MCA.
M.-Q. Nguyen and D.-C. Can—Shared first authors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
We are referring to the exact component \(v_Y\) calculation, not the generated output during model inference.
References
Beltagy, I., Peters, M.E., Cohan, A.: Longformer: the long-document transformer. arXiv preprint arXiv:2004.05150 (2020)
Ben Abacha, A., Mrabet, Y., Zhang, Y., Shivade, C., Langlotz, C., Demner-Fushman, D.: Overview of the mediqa 2021 shared task on summarization in the medical domain. In: Proceedings of the 20th SIGBioMed Workshop on Biomedical Language Processing, NAACL-BioNLP 2021. Association for Computational Linguistics (2021)
Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)
Chen, J., Yang, D.: Multi-view sequence-to-sequence models with conversational structure for abstractive dialogue summarization. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 4106–4118 (2020)
Dai, Z., Yang, Z., Yang, Y., Carbonell, J.G., Le, Q., Salakhutdinov, R.: Transformer-xl: attentive language models beyond a fixed-length context. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2978–2988 (2019)
Dettmers, T., Pagnoni, A., Holtzman, A., Zettlemoyer, L.: Qlora: efficient finetuning of quantized LLMS. Adv. Neural Inf. Process. Syst. 36 (2024)
DeYoung, J., Beltagy, I., van Zuylen, M., Kuehl, B., Wang, L.L.: Ms2: multi-document summarization of medical studies. arXiv preprint arXiv:2104.06486 (2021)
Hermann, K.M., et al.: Teaching machines to read and comprehend. Adv. Neural Inf. Process. Syst. 28 (2015)
Hu, E.J., et al.: Lora: low-rank adaptation of large language models. In: International Conference on Learning Representations (2021)
Jin, Q., Dhingra, B., Liu, Z., Cohen, W., Lu, X.: Pubmedqa: a dataset for biomedical research question answering. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2567–2577 (2019)
Le, H.Q., et al.: UETfishes at MEDIQA 2021: standing-on-the-shoulders-of-giants model for abstractive multi-answer summarization. In: Proceedings of the 20th Workshop on Biomedical Language Processing, pp. 328–335. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.bionlp-1.38
Lewis, M., et al.: Bart: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7871–7880 (2020)
Lin, C.Y., Och, F.: Looking for a few good metrics: rand its evaluation. In: NTCIR Workshop (2004)
Longpre, S., et al.: The flan collection: designing data and methods for effective instruction tuning. In: International Conference on Machine Learning, pp. 22631–22648. PMLR (2023)
Mrini, K., et al.: UCSD-adobe at mediqa 2021: transfer learning and answer sentence selection for medical summarization. In: Proceedings of the 20th Workshop on Biomedical Language Processing, pp. 257–262 (2021)
Over, P., Yen, J.: An Introduction to Duc-2004. National Institute of Standards and Technology (2004)
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2383–2392 (2016)
Savery, M., Abacha, A.B., Gayen, S., Demner-Fushman, D.: Question-driven summarization of answers to consumer health questions. Sci. Data 7(1), 1–9 (2020)
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Zaheer, M., et al.: Big bird: transformers for longer sequences. Adv. Neural. Inf. Process. Syst. 33, 17283–17297 (2020)
Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y.: Bertscore: evaluating text generation with Bert. In: International Conference on Learning Representations (2019)
Zhu, W., et al.: paht_nlp@ mediqa 2021: Multi-grained query focused multi-answer summarization. In: Proceedings of the 20th Workshop on Biomedical Language Processing, pp. 96–102 (2021)
Acknowledgement
This research has been done under the research project QG.22.61 “Research and Development of Vietnamese Multi-document Summarization Based on Advanced Language Models” of Vietnam National University, Hanoi. Quang would like to thank Professor Hady W. Lauw, School of Computing and Information Systems, Singapore Management University, for his instructive comments during finishing this manuscript. He also wants to thank all other members of DS&KT Laboratory, University of Engineering and Technology, VNU Hanoi, for their supports.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Nguyen, MQ., Can, DC., Le, HQ. (2024). Zero-cost Transition to Multi-document Processing in Summarization with Multi-Channel Attention. In: Bifet, A., Davis, J., Krilavičius, T., Kull, M., Ntoutsi, E., Žliobaitė, I. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14945. Springer, Cham. https://doi.org/10.1007/978-3-031-70362-1_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-70362-1_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70361-4
Online ISBN: 978-3-031-70362-1
eBook Packages: Computer ScienceComputer Science (R0)