Refining Data for Text Generation

Liu, Qianying; Li, Tianyi; Guan, Wenyu; Li, Sujian

doi:10.1007/978-3-030-63031-7_7

Qianying Liu^14,15,
Tianyi Li¹⁴,
Wenyu Guan¹⁴ &
…
Sujian Li¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12522))

Included in the following conference series:

China National Conference on Chinese Computational Linguistics

779 Accesses
1 Citations

Abstract

Recent work on data-to-text generation has made progress under the neural encoder-decoder architectures. However, the data input size is often enormous, while not all data records are important for text generation and inappropriate input may bring noise into the final output. To solve this problem, we propose a two-step approach which first selects and orders the important data records and then generates text from the noise-reduced data. Here we propose a learning to rank model to rank the importance of each record which is supervised by a relation extractor. With the noise-reduced data as input, we implement a text generator which sequentially models the input data records and emits a summary. Experiments on the ROTOWIRE dataset verifies the effectiveness of our proposed method in both performance and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bosselut, A., Celikyilmaz, A., He, X., Gao, J., Huang, P.S., Choi, Y.: Discourse-aware neural rewards for coherent text generation. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), vol. 1, pp. 173–184 (2018)
Google Scholar
Cao, Z., Qin, T., Liu, T.Y., Tsai, M.F., Li, H.: Learning to rank: from pairwise approach to listwise approach. In: Proceedings of the 24th International Conference on Machine Learning, pp. 129–136. ACM (2007)
Google Scholar
Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4(Nov), 933–969 (2003)
MathSciNet MATH Google Scholar
Holmes-Higgin, P.: Text generation-using discourse strategies and focus constraints to generate natural language text by Kathleen R. Mckeown, Cambridge University Press, 1992, pp 246,£ 13.95, ISBN 0-521-43802-0. Knowl. Eng. Rev. 9(4), 421–422 (1994)
Article Google Scholar
Kukich, K.: Design of a knowledge-based report generator. In: 21st Annual Meeting of the Association for Computational Linguistics (1983). http://aclweb.org/anthology/P83-1022
Lebret, R., Grangier, D., Auli, M.: Neural text generation from structured data with application to the biography domain. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, 1–4 November 2016, pp. 1203–1213 (2016)
Google Scholar
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 1003–1011. Association for Computational Linguistics (2009). http://aclweb.org/anthology/P09-1113
Novikova, J., Dušek, O., Rieser, V.: The E2E dataset: new challenges for end-to-end generation. In: Proceedings of the 18th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Saarbrücken, Germany (2017). https://arxiv.org/abs/1706.09254. arXiv:1706.09254
Perez-Beltrachini, L., Lapata, M.: Bootstrapping generators from noisy data. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), vol. 1, pp. 1516–1527 (2018)
Google Scholar
Puduppully, R., Dong, L., Lapata, M.: Data-to-text generation with content selection and planning. arXiv preprint arXiv:1809.00582 (2018)
Reiter, E., Dale, R.: Building applied natural language generation systems. Nat. Lang. Eng. 3(1), 57–87 (1997)
Article Google Scholar
dos Santos, C., Xiang, B., Zhou, B.: Classifying relations by ranking with convolutional neural networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), vol. 1, pp. 626–634 (2015)
Google Scholar
See, A., Liu, P.J., Manning, C.D.: Get to the point: summarization with pointer-generator networks. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1073–1083 (2017)
Google Scholar
Sha, L., Mou, L., Liu, T., Poupart, P., Li, S., Chang, B., Sui, Z.: Order-planning neural text generation from structured data. arXiv preprint arXiv:1709.00155 (2017)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
Google Scholar
Wiseman, S., Shieber, S.M., Rush, A.M.: Challenges in data-to-document generation. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, 9–11 September 2017, pp. 2253–2263 (2017)
Google Scholar
Zhang, Z.: Weakly-supervised relation classification for information extraction. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, pp. 581–588. ACM (2004)
Google Scholar

Download references

Acknowledgement

We thank the anonymous reviewers for their helpful comments on this paper. This work was partially supported by National Key Research and Development Project (2019YFB1704002) and National Natural Science Foundation of China (61876009 and 61572049). The corresponding author of this paper is Sujian Li.

Author information

Authors and Affiliations

Key Laboratory of Computational Linguistics, MOE, Peking University, Beijing, China
Qianying Liu, Tianyi Li, Wenyu Guan & Sujian Li
Graduate School of Informatics, Kyoto University, Kyoto, Japan
Qianying Liu

Authors

Qianying Liu
View author publications
You can also search for this author in PubMed Google Scholar
Tianyi Li
View author publications
You can also search for this author in PubMed Google Scholar
Wenyu Guan
View author publications
You can also search for this author in PubMed Google Scholar
Sujian Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sujian Li .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Maosong Sun
Peking University, Beijing, China
Sujian Li
Westlake University, Hangzhou, China
Yue Zhang
Tsinghua University, Beijing, China
Yang Liu
Chinese Academy of Sciences, Beijing, China
Shizhu He
Beijing Language and Culture University, Beijing, China
Gaoqi Rao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Q., Li, T., Guan, W., Li, S. (2020). Refining Data for Text Generation. In: Sun, M., Li, S., Zhang, Y., Liu, Y., He, S., Rao, G. (eds) Chinese Computational Linguistics. CCL 2020. Lecture Notes in Computer Science(), vol 12522. Springer, Cham. https://doi.org/10.1007/978-3-030-63031-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-63031-7_7
Published: 12 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63030-0
Online ISBN: 978-3-030-63031-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics