Skip to main content

A Data-to-Text Generation Model with Deduplicated Content Planning

  • Conference paper
  • First Online:
Big Data (BigData 2022)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1709))

Included in the following conference series:

  • 407 Accesses

Abstract

Texts generated in data-to-text generation tasks often have repetitive parts. In order to get higher quality generated texts, we choose a data-to-text generation model with content planning, and add coverage mechanisms to both the content planning and text generation stages. In the content planning stage, a coverage mechanism is introduced to remove duplicate content templates, so as to remove sentences with the same semantics in the generated texts. In the text generation stage, the coverage mechanism is added to remove the repeated words in the texts. In addition, in order to embed the positional association information in the data into the word vectors, we also add positional encoding to the word embedding. Then the word vectors are fed to the pointer network to generate content template. Finally, the content template is inputted into the text generator to generate the descriptive texts. Through experiments, the accuracy of the content planning and the BLEU of the generated texts have been improved, which verifies the effectiveness of our proposed data-to-text generation model.

The National Natural Science Foundation of China (61371196), National Science and Technology Major Project (2015ZX01040-201).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Wiseman, S., Shieber, S.M., Rush, A.M.: Challenges in data-to-document generation. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2253–2263 (2017)

    Google Scholar 

  2. Puduppully, R., Dong, L., Lapata, M.: Data-to-text generation with content selection and planning. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence, pp. 6908–6915 (2019)

    Google Scholar 

  3. Peng, L., Liu, Q., Lv, L., Deng, W., Wang, C.: An abstractive summarization method based on global gated dual encoder. In: Zhu, X., Zhang, M., Hong, Y., He, R. (eds.) NLPCC 2020. LNCS (LNAI), vol. 12431, pp. 355–365. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60457-8_29

    Chapter  Google Scholar 

  4. Bengio, Y., Ducharme, R., Vincent, P., et al.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)

    MATH  Google Scholar 

  5. Mikolov, T., Karafiát, M., Burget, L., et al.: Recurrent neural network based language model. In: Proceedings of the 11th Annual Conference of the International Speech Communication Association, pp. 1045–1048 (2010)

    Google Scholar 

  6. Sutskever, I., Martens, J., Hinton, G.: Generating text with recurrent neural networks. In: Proceedings of the 28th International Conference on International Conference on Machine Learning, pp. 1017–1024 (2011)

    Google Scholar 

  7. Lebret, R., Grangier, D., Auli, M.: Neural text generation from structured data with application to the biography domain. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1203–1213 (2016)

    Google Scholar 

  8. Gong, H., Wei, B., Xiaocheng, F., et al.: Enhancing content planning for table-to-text generation with data understanding and verification. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pp. 2905–2914 (2020)

    Google Scholar 

  9. Chen, K., Li, F., Hu, B., et al.: Neural data-to-text generation with dynamic content planning. Knowl. Based Syst. 215, 106610 (2021)

    Google Scholar 

  10. Xiaohong, X., Ting, H., Huazhen, W., et al.: Research on data-to-text generation based on transformer model and deep neural network. J. Chongqing Univ. 43(07), 91–100 (2020)

    Google Scholar 

  11. Distiawan, B.T., Jianzhong, Q., Rui, Z.: Sentence generation for entity description with content-plan attention. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence, pp. 9057–9064 (2020)

    Google Scholar 

  12. Zhaopeng, T., Zhengdong, L., Yang, L., et al.: Modeling coverage for neural machine translation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 76–85 (2016)

    Google Scholar 

  13. Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

    Google Scholar 

  14. Abigail, S., Peter, J.L., Christopher, D.M.: Get to the point: summarization with pointer-generator networks. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 1073–1083 (2017)

    Google Scholar 

  15. Klein, G., Kim, Y., Yuntian, D., et al.: OpenNMT: open-source toolkit for neural machine translation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics-System Demonstrations, pp. 67–72 (2017)

    Google Scholar 

  16. Rebuffel, C., Soulier, L., Scoutheeten, G., Gallinari, P.: A hierarchical model for data-to-text generation. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12035, pp. 65–80. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45439-5_5

    Chapter  Google Scholar 

  17. Papineni, K., Roukos, S., Ward, T., et al.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, pp. 311–318 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianjun Cao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, M., Cao, J., Yu, X., Nie, Z. (2022). A Data-to-Text Generation Model with Deduplicated Content Planning. In: Li, T., et al. Big Data. BigData 2022. Communications in Computer and Information Science, vol 1709. Springer, Singapore. https://doi.org/10.1007/978-981-19-8331-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-8331-3_6

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-8330-6

  • Online ISBN: 978-981-19-8331-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics