Improving Abstractive Multi-document Summarization with Predicate-Argument Structure Extraction

Cheng, Huangfei; Wu, Jiawei; Li, Tiantian; Cao, Bin; Fan, Jing

doi:10.1007/978-3-031-20865-2_20

Huangfei Cheng¹¹,
Jiawei Wu¹¹,
Tiantian Li¹¹,
Bin Cao¹¹ &
…
Jing Fan¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13630))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

1187 Accesses

Abstract

Multi-Document Summarization (MDS) aims to generate a concise summary for a collection of documents on the same topic. However, the fixed input length and a large number of redundancies in source documents make the pre-trained models less effective in MDS. In this paper, we propose a two-stage abstractive MDS model based on Predicate-Argument Structure (PAS). In the first stage, we divide the redundancy of documents into intra-sentence redundancy and inter-sentence redundancy. For intra-sentence redundancy, our model utilizes Semantic Role Labeling (SRL) to covert each sentence to a PAS. Benefiting from PAS, we can filter out redundant contents while preserving the salient information. For inter-sentence redundancy, we introduce a novel similarity calculation method that incorporates semantic and syntactic knowledge to identify and remove duplicate information. The above two steps significantly shorten the input length and eliminate documents redundancies, which is crucial for MDS. In the second stage, we sort the filtered PASs to ensure important contents appear at the beginning and concatenate them into a new document. We employ a pre-trained model ProphetNet to generate an abstractive summary from the new document. Our model combines the advantages of ProphetNet and PAS on global information to generate comprehensive summaries. We conduct extensive experiments on three standard MDS datasets. All experiments demonstrate that our model outperforms the abstractive MDS baselines measured by ROUGE scores. Furthermore, the first stage of our model can improve the performance of other pre-trained models in abstractive MDS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aksoy, C., Bugdayci, A., Gur, T., Uysal, I., Can, F.: Semantic argument frequency-based multi-document summarization. In: 2009 24th International Symposium on Computer and Information Sciences, pp. 460–464. IEEE (2009)
Google Scholar
Bae, S., Kim, T., Kim, J., Lee, S.G.: Summary level training of sentence rewriting for abstractive summarization. In: Proceedings of the 2nd Workshop on New Frontiers in Summarization, pp. 10–20 (2019)
Google Scholar
Barzilay, R., McKeown, K., Elhadad, M.: Information fusion in the context of multi-document summarization. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pp. 550–557 (1999)
Google Scholar
Bastianelli, E., Castellucci, G., Croce, D., Basili, R.: Textual inference and meaning representation in human robot interaction. In: Proceedings of the Joint Symposium on Semantic Processing. Textual Inference and Structures in Corpora, pp. 65–69 (2013)
Google Scholar
Bonial, C., Hwang, J., Bonn, J., Conger, K., Babko-Malaya, O., Palmer, M.: English propbank annotation guidelines. Center for Computational Language and Education Research, Institute of Cognitive Science, University of Colorado at Boulder, p. 48 (2012)
Google Scholar
Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 335–336 (1998)
Google Scholar
Carreras, X., Màrquez, L.: Introduction to the CoNLL-2005 shared task: semantic role labeling. In: Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005), pp. 152–164 (2005)
Google Scholar
Erkan, G., Radev, D.R.: LexRank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)
Article Google Scholar
Fabbri, A.R., Li, I., She, T., Li, S., Radev, D.: Multi-news: a large-scale multi-document summarization dataset and abstractive hierarchical model. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1074–1084 (2019)
Google Scholar
Ganesan, K., Zhai, C.X., Han, J.: Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions. In: 23rd International Conference on Computational Linguistics, COLING 2010 (2010)
Google Scholar
Gerani, S., Mehdad, Y., Carenini, G., Ng, R., Nejat, B.: Abstractive summarization of product reviews using discourse structure. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1602–1613 (2014)
Google Scholar
Ghalandari, D.G., Hokamp, C., Glover, J., Ifrim, G., et al.: A large-scale multi-document summarization dataset from the Wikipedia current events portal. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1302–1308 (2020)
Google Scholar
He, L., Lee, K., Lewis, M., Zettlemoyer, L.: Deep semantic role labeling: what works and what’s next. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 473–483 (2017)
Google Scholar
Kenton, J.D.M.W.C., Toutanova, L.K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp. 4171–4186 (2019)
Google Scholar
Khan, A., Salim, N., Kumar, Y.J.: A framework for multi-document abstractive summarization based on semantic role labelling. Appl. Soft Comput. 30, 737–747 (2015)
Article Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (Poster) (2015)
Google Scholar
Lebanoff, L., et al.: Scoring sentence singletons and pairs for abstractive summarization. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2175–2189 (2019)
Google Scholar
Lebanoff, L., Song, K., Liu, F.: Adapting the neural encoder-decoder framework from single to multi-document summarization. In: EMNLP (2018)
Google Scholar
Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7871–7880 (2020)
Google Scholar
Li, W., Xiao, X., Liu, J., Wu, H., Wang, H., Du, J.: Leveraging graph to improve abstractive multi-document summarization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 6232–6243 (2020)
Google Scholar
Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)
Google Scholar
Liu, P.J., et al.: Generating Wikipedia by summarizing long sequences. In: International Conference on Learning Representations (2018)
Google Scholar
Liu, Y., Lapata, M.: Hierarchical transformers for multi-document summarization. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5070–5081 (2019)
Google Scholar
Mendes, A., Narayan, S., Miranda, S., Marinho, Z., Martins, A.F., Cohen, S.B.: Jointly extracting and compressing documents with summary state representations. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 3955–3966 (2019)
Google Scholar
Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)
Google Scholar
Over, P., Yen, J.: An introduction to DUC-2004. National Institute of Standards and Technology (2004)
Google Scholar
Qi, W., et al.: ProphetNet: predicting future n-gram for sequence-to-sequencepre-training. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 2401–2410 (2020)
Google Scholar
Radev, D.R., Jing, H., Styś, M., Tam, D.: Centroid-based summarization of multiple documents. Inf. Process. Manag. 40(6), 919–938 (2004)
Article MATH Google Scholar
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 1–67 (2020)
MathSciNet MATH Google Scholar
See, A., Liu, P.J., Manning, C.D.: Get to the point: summarization with pointer-generator networks. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1073–1083 (2017)
Google Scholar
Shen, D., Lapata, M.: Using semantic roles to improve question answering. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 12–21 (2007)
Google Scholar
Shi, P., Lin, J.: Simple BERT models for relation extraction and semantic role labeling. arXiv preprint arXiv:1904.05255 (2019)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Google Scholar
Wu, D., Fung, P.: Semantic roles for SMT: a hybrid two-pass model. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, pp. 13–16 (2009)
Google Scholar
Zhang, H., Cai, J., Xu, J., Wang, J.: Pretraining-based natural language generation for text summarization. In: Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pp. 789–797 (2019)
Google Scholar
Zhang, J., Zhao, Y., Saleh, M., Liu, P.: PEGASUS: pre-training with extracted gap-sentences for abstractive summarization. In: International Conference on Machine Learning, pp. 11328–11339. PMLR (2020)
Google Scholar
Zhong, M., Liu, P., Chen, Y., Wang, D., Qiu, X., Huang, X.: Extractive summarization as text matching. In: ACL (2020)
Google Scholar

Download references

Acknowledgements

This research was supported by Key Research Project of Zhejiang Province (2022C01145).

Author information

Authors and Affiliations

Zhejiang University of Technology, Hangzhou, China
Huangfei Cheng, Jiawei Wu, Tiantian Li, Bin Cao & Jing Fan

Authors

Huangfei Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Jiawei Wu
View author publications
You can also search for this author in PubMed Google Scholar
Tiantian Li
View author publications
You can also search for this author in PubMed Google Scholar
Bin Cao
View author publications
You can also search for this author in PubMed Google Scholar
Jing Fan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bin Cao .

Editor information

Editors and Affiliations

CSIRO Australian e-Health Research Centre, Brisbane, QLD, Australia
Sankalp Khanna
Shanghai Jiao Tong University, Shanghai, China
Jian Cao
University of Tasmania, Hobart, TAS, Australia
Quan Bai
University of Technology Sydney, Sydney, NSW, Australia
Guandong Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cheng, H., Wu, J., Li, T., Cao, B., Fan, J. (2022). Improving Abstractive Multi-document Summarization with Predicate-Argument Structure Extraction. In: Khanna, S., Cao, J., Bai, Q., Xu, G. (eds) PRICAI 2022: Trends in Artificial Intelligence. PRICAI 2022. Lecture Notes in Computer Science, vol 13630. Springer, Cham. https://doi.org/10.1007/978-3-031-20865-2_20

Download citation

DOI: https://doi.org/10.1007/978-3-031-20865-2_20
Published: 04 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20864-5
Online ISBN: 978-3-031-20865-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving Abstractive Multi-document Summarization with Predicate-Argument Structure Extraction