Skip to main content

Augmented Topic-Specific Summarization for Domain Dialogue Text

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2022)

Abstract

This paper describes HW-TSC’s submission to the NLPCC 2022 dialogue text summarization task. We convert it into a sub-summary generation and a topic detection task. A sequence-to-sequence model Transformer is adopted as the foundational structure of our generation model. An ensemble topic detection model is used to filter uninformative summaries. On the other hand, we utilize multiple data processing and data augmentation methods to improve the effectiveness of the system. A constrained search method is used to construct generation model’s training pairs between sub-dialogues and sub-summaries. Multiple role-centric training data augmentation strategies are used to enhance both the generation model and the topic detection model. Our experiments demonstrate the effectiveness of these methods. Finally, we rank first with the highest ROUGE score of 51.764 in the test evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Caswell, I., Chelba, C., Grangier, D.: Tagged back-translation. In: Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers) (2019)

    Google Scholar 

  2. Chen, J., Yang, D.: Multi-view sequence-to-sequence models with conversational structure for abstractive dialogue summarization. In: Empirical Methods in Natural Language Processing (2020)

    Google Scholar 

  3. Cheng, J., Lapata, M.: Neural summarization by extracting sentences and words. In: Meeting of the Association for Computational Linguistics (2016)

    Google Scholar 

  4. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: North American Chapter of the Association for Computational Linguistics (2018)

    Google Scholar 

  5. Edunov, S., Ott, M., Auli, M., Grangier, D.: Understanding back-translation at scale. In: Empirical Methods in Natural Language Processing (2018)

    Google Scholar 

  6. El-Kassas, W.S., Salama, C., Rafea, A., Mohamed, H.K.: Automatic text summarization: a comprehensive survey. Expert Syst. Appl. 165, 113679 (2021)

    Article  Google Scholar 

  7. Goo, C.W., Chen, Y.N.: Abstractive dialogue summarization with sentence-gated modeling optimized by dialogue acts. In: Spoken Language Technology Workshop (2018)

    Google Scholar 

  8. He, J., Gu, J., Shen, J., Ranzato, M.: Revisiting self-training for neural sequence generation. In: International Conference on Learning Representations (2020)

    Google Scholar 

  9. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)

    Article  Google Scholar 

  10. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp. 427–431. Association for Computational Linguistics, April 2017

    Google Scholar 

  11. Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Meeting of the Association for Computational Linguistics (2019)

    Google Scholar 

  12. Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, Barcelona, Spain, p. 7481. Association for Computational Linguistics, July 2004. https://aclanthology.org/W04-1013

  13. Liu, J., et al.: Topic-aware contrastive learning for abstractive dialogue summarization. In: Findings of the Association for Computational Linguistics: EMNLP 2021, Punta Cana, Dominican Republic, pp. 1229–1243. Association for Computational Linguistics, November 2021. https://doi.org/10.18653/v1/2021.findings-emnlp.106

  14. Liu, J., et al.: Topic-aware contrastive learning for abstractive dialogue summarization. In: Empirical Methods in Natural Language Processing (2021)

    Google Scholar 

  15. Liu, Y., et al.: Multilingual denoising pre-training for neural machine translation. Trans. Assoc. Comput. Linguist. 8, 726–742 (2020). https://doi.org/10.1162/tacl_a_00343, https://aclanthology.org/2020.tacl-1.47

  16. Liu, Z., Ng, A., Lee, S., Aw, A., Chen, N.F.: Topic-aware pointer-generator networks for summarizing spoken conversations. In: IEEE Automatic Speech Recognition and Understanding Workshop (2019)

    Google Scholar 

  17. Loem, M., Takase, S., Kaneko, M., Okazaki, N.: ExtraPhrase: efficient data augmentation for abstractive summarization (2022)

    Google Scholar 

  18. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv:1711.05101 (2017)

  19. Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Empirical Methods in Natural Language Processing (2004)

    Google Scholar 

  20. Nallapati, R., Zhai, F., Zhou, B.: SummaRuNNer: a recurrent neural network based sequence model for extractive summarization of documents. In: National Conference on Artificial Intelligence (2016)

    Google Scholar 

  21. Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 8024–8035. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

  22. Tang, Y., et al.: Multilingual translation from denoising pre-training. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 3450–3466. Association for Computational Linguistics, August 2021. https://doi.org/10.18653/v1/2021.findings-acl.304

  23. Vaswani, A., et al.: Attention is all you need. In: Neural Information Processing Systems (2017)

    Google Scholar 

  24. Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y.: BERTScore: evaluating text generation with BERT. Learning (2019)

    Google Scholar 

  25. Zou, Y., et al.: Topic-oriented spoken dialogue summarization for customer service with saliency-aware topic modeling. arXiv:2012.07311 (2020)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hao Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rao, Z. et al. (2022). Augmented Topic-Specific Summarization for Domain Dialogue Text. In: Lu, W., Huang, S., Hong, Y., Zhou, X. (eds) Natural Language Processing and Chinese Computing. NLPCC 2022. Lecture Notes in Computer Science(), vol 13552. Springer, Cham. https://doi.org/10.1007/978-3-031-17189-5_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-17189-5_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-17188-8

  • Online ISBN: 978-3-031-17189-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics