Skip to main content
Log in

Fixed global memory for controllable long text generation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Long text generation is a challenging yet unsolved task. To generate long, coherent, and consistent text, existing approaches need to increase the language model length accordingly. However, the cost of the computational and memory resources grows as the square of the length. Even trained with thousands of GPUs, the length of language models is still limited to a few thousand, which may cause the generation of longer texts to be inconsistent with the topics and ideas in their preceding texts. To address this, we propose a novel Transformer architecture called Transformer with Local and Global Memory (Transformer LGM). It is inspired by the way people write long articles, which generate a key idea first and then guide the writing of the entire article with the idea in mind. Such a “key idea” can be put into the fixed global memory of the Transformer LGM to guide the whole generation process. On the contrary, the local memory, which is responsible for local coherence, could shift and drop with the increasing length of the generated text. We implement the global memory by introducing a negative positional embedding, while the traditional positive positional embedding is still used for the local memory. Experiments show that by utilizing the global memory, our model could generate long, coherent, and consistent text without enlarging the length of the language model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data Availability

The data that support the findings of this study are openly available at https://github.com/brightmart/nlp_chinese_corpus.

References

  1. Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training. OpenAI Blog

  2. Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners. OpenAI Blog 1(8):9

    Google Scholar 

  3. Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler DM, Wu J, Winter C, Hesse C, Chen M, Sigler E , Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D (2020) Language models are few-shot learners. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in neural information processing systems 33

  4. Dufter P, Schmitt M, Schütze H (2021) Position information in transformers: an overview. Computational Linguistics, pp 1–31

  5. Tay Y, Dehghani M, Bahri D, Metzler D (2022) Efficient transformers: a survey. ACM Comput. Surv. https://doi.org/10.1145/3530811

  6. Dai Z, Yang Z, Yang Y, Carbonell J, Le Q, Salakhutdinov R (2019) Transformer-XL: attentive language models beyond a fixed-length context. In: Proceedings of the 57th Annual meeting of the association for computational linguistics, pp 2978–2988. Association for computational linguistics

  7. Keskar NS, McCann B, Varshney LR, Xiong C, Socher R (2019) Ctrl: a conditional transformer language model for controllable generation. arXiv:1909.05858

  8. Yao L, Peng N, Weischedel R, Knight K, Zhao D, Yan R (2019) Plan-and-write: towards better automatic storytelling. Proceedings of the 33rd AAAI Conference on Artificial Intelligence, pp 7378–7385

  9. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) advances in neural information processing systems 27, pp 3104– 3112

  10. Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(140):1–67

    MathSciNet  MATH  Google Scholar 

  11. Ziegler ZM, Melas-Kyriazi L, Gehrmann S, Rush AM (2019) Encoder-agnostic adaptation for conditional language generation. arXiv:1908.06938, pp 1–12

  12. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Guyon, I., Von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R. (eds.) advances in neural information processing systems 30, pp 5998–6008

  13. Sun T-X, Liu X-Y, Qiu X-P, Huang X-J (2022) Paradigm shift in natural language processing. Mach Intell Res 19(3):169– 183

    Article  Google Scholar 

  14. Xu B (2019) NLP Chinese corpus: large scale chinese corpus for NLP zenodo

  15. Zhang Z (2019) GPT2-ML: GPT-2 For Multiple Languages GitHub

  16. Zhao Z, Chen H, Zhang J, Zhao X, Liu T, Lu W, Chen X, Deng H, Ju Q, Du X (2019) Uer: an open-source toolkit for pre-training models. EMNLP-IJCNLP 2019:241

    Google Scholar 

  17. Cui Y, Che W, Liu T, Qin B, Yang Z (2021) Pre-training with whole word masking for Chinese BERT. IEEE/ACM Trans Audio Speech Lang Proc 29:3504–3514. https://doi.org/10.1109/TASLP.2021.3124365

    Article  Google Scholar 

  18. Holtzman A, Buys J, Du L, Forbes M, Choi Y (2020) The curious case of neural text degeneration. In: Proceedings of the 8th International conference on learning representations

  19. Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual meeting of the association for computational linguistics, pp 311–318

  20. Zhu Y, Lu S, Zheng L, Guo J, Zhang W, Wang J, Yu Y (2018) Texygen: a benchmarking platform for text generation models. In: The 41st international ACM SIGIR conference on research & development in information retrieval, pp 1097–1100

  21. Wong BT, Kit C (2012) Extending machine translation evaluation metrics with lexical cohesion to document level. In: Proceedings of the 2012 Joint conference on empirical methods in natural language processing and computational natural language learning, pp 1060–1068

  22. Xiao H (2018) Bert-as-service https://github.com/hanxiao/bert-as-service. Accessed June 2020, version v1.8.1

  23. Khan S, Naseer M, Hayat M, Zamir SW, Khan FS, Shah M (2022) Transformers in vision: a survey. ACM Comput Surv 54:10s, Article 200. https://doi.org/10.1145/3505244. 41 pages

    Article  Google Scholar 

  24. Kitaev N, Kaiser L, Levskaya A (2020) Reformer: the efficient transformer. In: Proceedings of the 8th International Conference on Learning Representations

  25. Wang S, Li BZ, Khabsa M, Fang H, Ma H (2020) Linformer: self-attention with linear complexity. arXiv:2006.04768

  26. Choromanski K, Likhosherstov V, Dohan D, Song X, Gane A, Sarlos T, Hawkins P, Davis J, Belanger D, Colwell L et al (2020) Masked language modeling for proteins via linearly scalable long-context transformers. arXiv:2006.03555

  27. Beltagy I, Peters ME, Cohan A (2020) Longformer: the long-document transformer. arXiv:2004.05150

  28. Qiu J, Ma H, Levy O, Yih W-T, Wang S, Tang J (2020) Blockwise self-attention for long document understanding. In: Findings of the association for computational linguistics: EMNLP 2020, pp 2555–2565. Association for computational linguistics

  29. Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training Of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, pp 4171–4186. Association for Computational Linguistics

  30. Luo W, Li Y, Urtasun R, Zemel RS (2016) Understanding the effective receptive field in deep convolutional neural networks. In: Lee, D.D., Sugiyama, M., von Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in neural information processing systems 29, pp 4898–4906

  31. Parmar N, Vaswani A, Uszkoreit J, Kaiser L, Shazeer N, Ku A, Tran D (2018) Image transformer. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th international conference on machine learning. Proceedings of machine learning research, vol 80, pp 4052–4061

  32. Lee J, Lee Y, Kim J, Kosiorek AR, Choi S, Teh YW (2019) Set transformer: a framework for attention-based permutation-invariant neural networks. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th international conference on machine learning. proceedings of machine learning research, vol 97, pp 3744–3753

  33. Rae JW, Potapenko A, Jayakumar SM, Hillier C, Lillicrap TP (2020) Compressive transformers for long-range sequence modelling. In: Proceedings of the 8th international conference on learning representations

  34. Wu Q, Lan Z, Gu J, Yu Z (2020) Memformer: the memory-augmented transformer. arXiv:2010.06891

  35. Ding S, Shang J, Wang S, Sun Y, Tian H, Wu H, Wang H (2021) ERNIE-Doc: A retrospective long-document modeling transformer. In: Proceedings of the 59th Annual meeting of the association for computational linguistics, pp 2914–2927. Association for computational linguistics

  36. Yoshida D, Ettinger A, Gimpel K (2020) Adding recurrence to pretrained transformers for improved efficiency and context size. arXiv:2008.07027

  37. Snelson E, Ghahramani Z (2005) Sparse gaussian processes using pseudo-inputs. In: Advances in neural information processing systems 18, pp 1257–1264

  38. Fowlkes C, Belongie S, Chung F, Malik J (2004) Spectral grouping using the nystrom method. IEEE Trans Pattern Anal Mach Intell 26(2):214–225

    Article  Google Scholar 

  39. Guo Q, Qiu X, Liu P, Shao Y, Xue X, Zhang Z (2019) Star-transformer. In: Proceedings of the 2019 Conference of the north american chapter of the association for computational linguistics, pp 1315–1325. Association for computational linguistics

  40. Ainslie J, Ontanon S, Alberti C, Cvicek V, Fisher Z, Pham P, Ravula A, Sanghai S, Wang Q, Yang L (2020) ETC: encoding Long and structured inputs in transformers. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 268–284. Association for Computational Linguistics

  41. Ho J, Kalchbrenner N, Weissenborn D, Salimans T (2019) Axial attention in multidimensional transformers. arXiv:1912.12180

  42. Zaheer M, Guruganesh G, Dubey KA, Ainslie J, Alberti C, Ontanon S, Pham P, Ravula A, Wang Q, Yang L et al (2020) Big bird: transformers for longer sequences. In: Proceedings of the 12th Conference on neural information processing systems

  43. Child R, Gray S, Radford A, Sutskever I (2019) Generating long sequences with sparse transformers. arXiv:1904.10509

Download references

Acknowledgements

This work is supported by the Natural Science Foundation of Sichuan Province (2022NSFSC0503), and Sichuan Science and Technology Program (2022ZHCG0007).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zheng Chen.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Z., Liu, Z. Fixed global memory for controllable long text generation. Appl Intell 53, 13993–14007 (2023). https://doi.org/10.1007/s10489-022-04197-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-04197-6

Keywords

Navigation