Fixed global memory for controllable long text generation

Chen, Zheng; Liu, Zhejun

doi:10.1007/s10489-022-04197-6

Fixed global memory for controllable long text generation

Published: 20 October 2022

Volume 53, pages 13993–14007, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

312 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Long text generation is a challenging yet unsolved task. To generate long, coherent, and consistent text, existing approaches need to increase the language model length accordingly. However, the cost of the computational and memory resources grows as the square of the length. Even trained with thousands of GPUs, the length of language models is still limited to a few thousand, which may cause the generation of longer texts to be inconsistent with the topics and ideas in their preceding texts. To address this, we propose a novel Transformer architecture called Transformer with Local and Global Memory (Transformer LGM). It is inspired by the way people write long articles, which generate a key idea first and then guide the writing of the entire article with the idea in mind. Such a “key idea” can be put into the fixed global memory of the Transformer LGM to guide the whole generation process. On the contrary, the local memory, which is responsible for local coherence, could shift and drop with the increasing length of the generated text. We implement the global memory by introducing a negative positional embedding, while the traditional positive positional embedding is still used for the local memory. Experiments show that by utilizing the global memory, our model could generate long, coherent, and consistent text without enlarging the length of the language model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generating Long and Coherent Text with Multi-Level Generative Adversarial Networks

A Data-to-Text Generation Model with Deduplicated Content Planning

A Systematic survey on automated text generation tools and techniques: application, evaluation, and challenges

Article 19 April 2023

Data Availability

The data that support the findings of this study are openly available at https://github.com/brightmart/nlp_chinese_corpus.

References

Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training. OpenAI Blog
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners. OpenAI Blog 1(8):9
Google Scholar
Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler DM, Wu J, Winter C, Hesse C, Chen M, Sigler E , Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D (2020) Language models are few-shot learners. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in neural information processing systems 33
Dufter P, Schmitt M, Schütze H (2021) Position information in transformers: an overview. Computational Linguistics, pp 1–31
Tay Y, Dehghani M, Bahri D, Metzler D (2022) Efficient transformers: a survey. ACM Comput. Surv. https://doi.org/10.1145/3530811
Dai Z, Yang Z, Yang Y, Carbonell J, Le Q, Salakhutdinov R (2019) Transformer-XL: attentive language models beyond a fixed-length context. In: Proceedings of the 57th Annual meeting of the association for computational linguistics, pp 2978–2988. Association for computational linguistics
Keskar NS, McCann B, Varshney LR, Xiong C, Socher R (2019) Ctrl: a conditional transformer language model for controllable generation. arXiv:1909.05858
Yao L, Peng N, Weischedel R, Knight K, Zhao D, Yan R (2019) Plan-and-write: towards better automatic storytelling. Proceedings of the 33rd AAAI Conference on Artificial Intelligence, pp 7378–7385
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) advances in neural information processing systems 27, pp 3104– 3112
Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(140):1–67
MathSciNet MATH Google Scholar
Ziegler ZM, Melas-Kyriazi L, Gehrmann S, Rush AM (2019) Encoder-agnostic adaptation for conditional language generation. arXiv:1908.06938, pp 1–12
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Guyon, I., Von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R. (eds.) advances in neural information processing systems 30, pp 5998–6008
Sun T-X, Liu X-Y, Qiu X-P, Huang X-J (2022) Paradigm shift in natural language processing. Mach Intell Res 19(3):169– 183
Article Google Scholar
Xu B (2019) NLP Chinese corpus: large scale chinese corpus for NLP zenodo
Zhang Z (2019) GPT2-ML: GPT-2 For Multiple Languages GitHub
Zhao Z, Chen H, Zhang J, Zhao X, Liu T, Lu W, Chen X, Deng H, Ju Q, Du X (2019) Uer: an open-source toolkit for pre-training models. EMNLP-IJCNLP 2019:241
Google Scholar
Cui Y, Che W, Liu T, Qin B, Yang Z (2021) Pre-training with whole word masking for Chinese BERT. IEEE/ACM Trans Audio Speech Lang Proc 29:3504–3514. https://doi.org/10.1109/TASLP.2021.3124365
Article Google Scholar
Holtzman A, Buys J, Du L, Forbes M, Choi Y (2020) The curious case of neural text degeneration. In: Proceedings of the 8th International conference on learning representations
Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual meeting of the association for computational linguistics, pp 311–318
Zhu Y, Lu S, Zheng L, Guo J, Zhang W, Wang J, Yu Y (2018) Texygen: a benchmarking platform for text generation models. In: The 41st international ACM SIGIR conference on research & development in information retrieval, pp 1097–1100
Wong BT, Kit C (2012) Extending machine translation evaluation metrics with lexical cohesion to document level. In: Proceedings of the 2012 Joint conference on empirical methods in natural language processing and computational natural language learning, pp 1060–1068
Xiao H (2018) Bert-as-service https://github.com/hanxiao/bert-as-service. Accessed June 2020, version v1.8.1
Khan S, Naseer M, Hayat M, Zamir SW, Khan FS, Shah M (2022) Transformers in vision: a survey. ACM Comput Surv 54:10s, Article 200. https://doi.org/10.1145/3505244. 41 pages
Article Google Scholar
Kitaev N, Kaiser L, Levskaya A (2020) Reformer: the efficient transformer. In: Proceedings of the 8th International Conference on Learning Representations
Wang S, Li BZ, Khabsa M, Fang H, Ma H (2020) Linformer: self-attention with linear complexity. arXiv:2006.04768
Choromanski K, Likhosherstov V, Dohan D, Song X, Gane A, Sarlos T, Hawkins P, Davis J, Belanger D, Colwell L et al (2020) Masked language modeling for proteins via linearly scalable long-context transformers. arXiv:2006.03555
Beltagy I, Peters ME, Cohan A (2020) Longformer: the long-document transformer. arXiv:2004.05150
Qiu J, Ma H, Levy O, Yih W-T, Wang S, Tang J (2020) Blockwise self-attention for long document understanding. In: Findings of the association for computational linguistics: EMNLP 2020, pp 2555–2565. Association for computational linguistics
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training Of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, pp 4171–4186. Association for Computational Linguistics
Luo W, Li Y, Urtasun R, Zemel RS (2016) Understanding the effective receptive field in deep convolutional neural networks. In: Lee, D.D., Sugiyama, M., von Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in neural information processing systems 29, pp 4898–4906
Parmar N, Vaswani A, Uszkoreit J, Kaiser L, Shazeer N, Ku A, Tran D (2018) Image transformer. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th international conference on machine learning. Proceedings of machine learning research, vol 80, pp 4052–4061
Lee J, Lee Y, Kim J, Kosiorek AR, Choi S, Teh YW (2019) Set transformer: a framework for attention-based permutation-invariant neural networks. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th international conference on machine learning. proceedings of machine learning research, vol 97, pp 3744–3753
Rae JW, Potapenko A, Jayakumar SM, Hillier C, Lillicrap TP (2020) Compressive transformers for long-range sequence modelling. In: Proceedings of the 8th international conference on learning representations
Wu Q, Lan Z, Gu J, Yu Z (2020) Memformer: the memory-augmented transformer. arXiv:2010.06891
Ding S, Shang J, Wang S, Sun Y, Tian H, Wu H, Wang H (2021) ERNIE-Doc: A retrospective long-document modeling transformer. In: Proceedings of the 59th Annual meeting of the association for computational linguistics, pp 2914–2927. Association for computational linguistics
Yoshida D, Ettinger A, Gimpel K (2020) Adding recurrence to pretrained transformers for improved efficiency and context size. arXiv:2008.07027
Snelson E, Ghahramani Z (2005) Sparse gaussian processes using pseudo-inputs. In: Advances in neural information processing systems 18, pp 1257–1264
Fowlkes C, Belongie S, Chung F, Malik J (2004) Spectral grouping using the nystrom method. IEEE Trans Pattern Anal Mach Intell 26(2):214–225
Article Google Scholar
Guo Q, Qiu X, Liu P, Shao Y, Xue X, Zhang Z (2019) Star-transformer. In: Proceedings of the 2019 Conference of the north american chapter of the association for computational linguistics, pp 1315–1325. Association for computational linguistics
Ainslie J, Ontanon S, Alberti C, Cvicek V, Fisher Z, Pham P, Ravula A, Sanghai S, Wang Q, Yang L (2020) ETC: encoding Long and structured inputs in transformers. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 268–284. Association for Computational Linguistics
Ho J, Kalchbrenner N, Weissenborn D, Salimans T (2019) Axial attention in multidimensional transformers. arXiv:1912.12180
Zaheer M, Guruganesh G, Dubey KA, Ainslie J, Alberti C, Ontanon S, Pham P, Ravula A, Wang Q, Yang L et al (2020) Big bird: transformers for longer sequences. In: Proceedings of the 12th Conference on neural information processing systems
Child R, Gray S, Radford A, Sutskever I (2019) Generating long sequences with sparse transformers. arXiv:1904.10509

Download references

Acknowledgements

This work is supported by the Natural Science Foundation of Sichuan Province (2022NSFSC0503), and Sichuan Science and Technology Program (2022ZHCG0007).

Author information

Authors and Affiliations

School of Information and Software Engineering, University of Electronic Science and Technology of China, No.4, Section 2, North Jianshe Road, Chengdu, 610054, Sichuan, China
Zheng Chen & Zhejun Liu

Authors

Zheng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhejun Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zheng Chen.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, Z., Liu, Z. Fixed global memory for controllable long text generation. Appl Intell 53, 13993–14007 (2023). https://doi.org/10.1007/s10489-022-04197-6

Download citation

Accepted: 21 September 2022
Published: 20 October 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s10489-022-04197-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fixed global memory for controllable long text generation

Abstract

Access this article

Similar content being viewed by others

Generating Long and Coherent Text with Multi-Level Generative Adversarial Networks

A Data-to-Text Generation Model with Deduplicated Content Planning

A Systematic survey on automated text generation tools and techniques: application, evaluation, and challenges

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fixed global memory for controllable long text generation

Abstract

Access this article

Similar content being viewed by others

Generating Long and Coherent Text with Multi-Level Generative Adversarial Networks

A Data-to-Text Generation Model with Deduplicated Content Planning

A Systematic survey on automated text generation tools and techniques: application, evaluation, and challenges

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation