Abstract
Long text generation is a challenging yet unsolved task. To generate long, coherent, and consistent text, existing approaches need to increase the language model length accordingly. However, the cost of the computational and memory resources grows as the square of the length. Even trained with thousands of GPUs, the length of language models is still limited to a few thousand, which may cause the generation of longer texts to be inconsistent with the topics and ideas in their preceding texts. To address this, we propose a novel Transformer architecture called Transformer with Local and Global Memory (Transformer LGM). It is inspired by the way people write long articles, which generate a key idea first and then guide the writing of the entire article with the idea in mind. Such a “key idea” can be put into the fixed global memory of the Transformer LGM to guide the whole generation process. On the contrary, the local memory, which is responsible for local coherence, could shift and drop with the increasing length of the generated text. We implement the global memory by introducing a negative positional embedding, while the traditional positive positional embedding is still used for the local memory. Experiments show that by utilizing the global memory, our model could generate long, coherent, and consistent text without enlarging the length of the language model.
Similar content being viewed by others
Data Availability
The data that support the findings of this study are openly available at https://github.com/brightmart/nlp_chinese_corpus.
References
Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training. OpenAI Blog
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners. OpenAI Blog 1(8):9
Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler DM, Wu J, Winter C, Hesse C, Chen M, Sigler E , Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D (2020) Language models are few-shot learners. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in neural information processing systems 33
Dufter P, Schmitt M, Schütze H (2021) Position information in transformers: an overview. Computational Linguistics, pp 1–31
Tay Y, Dehghani M, Bahri D, Metzler D (2022) Efficient transformers: a survey. ACM Comput. Surv. https://doi.org/10.1145/3530811
Dai Z, Yang Z, Yang Y, Carbonell J, Le Q, Salakhutdinov R (2019) Transformer-XL: attentive language models beyond a fixed-length context. In: Proceedings of the 57th Annual meeting of the association for computational linguistics, pp 2978–2988. Association for computational linguistics
Keskar NS, McCann B, Varshney LR, Xiong C, Socher R (2019) Ctrl: a conditional transformer language model for controllable generation. arXiv:1909.05858
Yao L, Peng N, Weischedel R, Knight K, Zhao D, Yan R (2019) Plan-and-write: towards better automatic storytelling. Proceedings of the 33rd AAAI Conference on Artificial Intelligence, pp 7378–7385
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) advances in neural information processing systems 27, pp 3104– 3112
Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(140):1–67
Ziegler ZM, Melas-Kyriazi L, Gehrmann S, Rush AM (2019) Encoder-agnostic adaptation for conditional language generation. arXiv:1908.06938, pp 1–12
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Guyon, I., Von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R. (eds.) advances in neural information processing systems 30, pp 5998–6008
Sun T-X, Liu X-Y, Qiu X-P, Huang X-J (2022) Paradigm shift in natural language processing. Mach Intell Res 19(3):169– 183
Xu B (2019) NLP Chinese corpus: large scale chinese corpus for NLP zenodo
Zhang Z (2019) GPT2-ML: GPT-2 For Multiple Languages GitHub
Zhao Z, Chen H, Zhang J, Zhao X, Liu T, Lu W, Chen X, Deng H, Ju Q, Du X (2019) Uer: an open-source toolkit for pre-training models. EMNLP-IJCNLP 2019:241
Cui Y, Che W, Liu T, Qin B, Yang Z (2021) Pre-training with whole word masking for Chinese BERT. IEEE/ACM Trans Audio Speech Lang Proc 29:3504–3514. https://doi.org/10.1109/TASLP.2021.3124365
Holtzman A, Buys J, Du L, Forbes M, Choi Y (2020) The curious case of neural text degeneration. In: Proceedings of the 8th International conference on learning representations
Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual meeting of the association for computational linguistics, pp 311–318
Zhu Y, Lu S, Zheng L, Guo J, Zhang W, Wang J, Yu Y (2018) Texygen: a benchmarking platform for text generation models. In: The 41st international ACM SIGIR conference on research & development in information retrieval, pp 1097–1100
Wong BT, Kit C (2012) Extending machine translation evaluation metrics with lexical cohesion to document level. In: Proceedings of the 2012 Joint conference on empirical methods in natural language processing and computational natural language learning, pp 1060–1068
Xiao H (2018) Bert-as-service https://github.com/hanxiao/bert-as-service. Accessed June 2020, version v1.8.1
Khan S, Naseer M, Hayat M, Zamir SW, Khan FS, Shah M (2022) Transformers in vision: a survey. ACM Comput Surv 54:10s, Article 200. https://doi.org/10.1145/3505244. 41 pages
Kitaev N, Kaiser L, Levskaya A (2020) Reformer: the efficient transformer. In: Proceedings of the 8th International Conference on Learning Representations
Wang S, Li BZ, Khabsa M, Fang H, Ma H (2020) Linformer: self-attention with linear complexity. arXiv:2006.04768
Choromanski K, Likhosherstov V, Dohan D, Song X, Gane A, Sarlos T, Hawkins P, Davis J, Belanger D, Colwell L et al (2020) Masked language modeling for proteins via linearly scalable long-context transformers. arXiv:2006.03555
Beltagy I, Peters ME, Cohan A (2020) Longformer: the long-document transformer. arXiv:2004.05150
Qiu J, Ma H, Levy O, Yih W-T, Wang S, Tang J (2020) Blockwise self-attention for long document understanding. In: Findings of the association for computational linguistics: EMNLP 2020, pp 2555–2565. Association for computational linguistics
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training Of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, pp 4171–4186. Association for Computational Linguistics
Luo W, Li Y, Urtasun R, Zemel RS (2016) Understanding the effective receptive field in deep convolutional neural networks. In: Lee, D.D., Sugiyama, M., von Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in neural information processing systems 29, pp 4898–4906
Parmar N, Vaswani A, Uszkoreit J, Kaiser L, Shazeer N, Ku A, Tran D (2018) Image transformer. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th international conference on machine learning. Proceedings of machine learning research, vol 80, pp 4052–4061
Lee J, Lee Y, Kim J, Kosiorek AR, Choi S, Teh YW (2019) Set transformer: a framework for attention-based permutation-invariant neural networks. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th international conference on machine learning. proceedings of machine learning research, vol 97, pp 3744–3753
Rae JW, Potapenko A, Jayakumar SM, Hillier C, Lillicrap TP (2020) Compressive transformers for long-range sequence modelling. In: Proceedings of the 8th international conference on learning representations
Wu Q, Lan Z, Gu J, Yu Z (2020) Memformer: the memory-augmented transformer. arXiv:2010.06891
Ding S, Shang J, Wang S, Sun Y, Tian H, Wu H, Wang H (2021) ERNIE-Doc: A retrospective long-document modeling transformer. In: Proceedings of the 59th Annual meeting of the association for computational linguistics, pp 2914–2927. Association for computational linguistics
Yoshida D, Ettinger A, Gimpel K (2020) Adding recurrence to pretrained transformers for improved efficiency and context size. arXiv:2008.07027
Snelson E, Ghahramani Z (2005) Sparse gaussian processes using pseudo-inputs. In: Advances in neural information processing systems 18, pp 1257–1264
Fowlkes C, Belongie S, Chung F, Malik J (2004) Spectral grouping using the nystrom method. IEEE Trans Pattern Anal Mach Intell 26(2):214–225
Guo Q, Qiu X, Liu P, Shao Y, Xue X, Zhang Z (2019) Star-transformer. In: Proceedings of the 2019 Conference of the north american chapter of the association for computational linguistics, pp 1315–1325. Association for computational linguistics
Ainslie J, Ontanon S, Alberti C, Cvicek V, Fisher Z, Pham P, Ravula A, Sanghai S, Wang Q, Yang L (2020) ETC: encoding Long and structured inputs in transformers. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 268–284. Association for Computational Linguistics
Ho J, Kalchbrenner N, Weissenborn D, Salimans T (2019) Axial attention in multidimensional transformers. arXiv:1912.12180
Zaheer M, Guruganesh G, Dubey KA, Ainslie J, Alberti C, Ontanon S, Pham P, Ravula A, Wang Q, Yang L et al (2020) Big bird: transformers for longer sequences. In: Proceedings of the 12th Conference on neural information processing systems
Child R, Gray S, Radford A, Sutskever I (2019) Generating long sequences with sparse transformers. arXiv:1904.10509
Acknowledgements
This work is supported by the Natural Science Foundation of Sichuan Province (2022NSFSC0503), and Sichuan Science and Technology Program (2022ZHCG0007).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, Z., Liu, Z. Fixed global memory for controllable long text generation. Appl Intell 53, 13993–14007 (2023). https://doi.org/10.1007/s10489-022-04197-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-04197-6