skip to main content
10.1145/3446132.3446417acmotherconferencesArticle/Chapter ViewAbstractPublication PagesacaiConference Proceedingsconference-collections
research-article

Transgenerators

Published: 09 March 2021 Publication History

Abstract

Pre-trained Transformers(GPT) are showed great performance in natural language generation task. This model was trained in a self-supervised manner on a large amount of text data crawled from the WEB. Such a dataset has not the highest quality, many sentences are prone to errors such as typos or grammar mistakes. As a result, text generated by GPTs consists of a lot of grammar incorrect sentences. While Transformers is also showed great performance in translation tasks, we propose the conception when a model can handle a generation and a translation task at the same time. But we propose a specific type of translation, in our method Transformer is training to translate a sentence with grammar errors to the same sentences without errors. In the full case, an incorrectly generated sentence can be corrected by the extended version of the same model, we call this type of model Transgenerator. We applied several experiments to estimate a generative power of Transgenerator based on GPT-2 architecture and the proposed method outperformed original GPT-2 model on the range of tasks

References

[1]
2015. grammar-check. https://github.com/viraja1/grammar-check.
[2]
2018. Improving language understanding by generative pre-training.
[3]
Jong Alec Radford, Jeff Wook Kim, and Wu. 2019. Gpt-2-output-dataset. https://github.com/openai/gpt-2-output-dataset.
[4]
Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. arXiv preprint arXiv:2005.14165(2020).
[5]
Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V Le, and Ruslan Salakhutdinov. 2019. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860(2019).
[6]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018).
[7]
Samuel Dupond. 2019. A thorough review on the current advance of neural network structures. Annual Reviews in Control 14 (2019), 200–230.
[8]
William Fedus, Ian Goodfellow, and Andrew M Dai. 2018. MaskGAN: Better text generation via filling in the_. arXiv preprint arXiv:1801.07736(2018).
[9]
Yoav Goldberg. 2016. A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research 57 (2016), 345–420.
[10]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672–2680.
[11]
Alex M Lamb, Anirudh Goyal ALIAS PARTH GOYAL, Ying Zhang, Saizheng Zhang, Aaron C Courville, and Yoshua Bengio. 2016. Professor forcing: A new algorithm for training recurrent networks. Advances in neural information processing systems 29 (2016), 4601–4609.
[12]
Jiwei Li, Will Monroe, Tianlin Shi, Sébastien Jean, Alan Ritter, and Dan Jurafsky. 2017. Adversarial learning for neural dialogue generation. arXiv preprint arXiv:1701.06547(2017).
[13]
Denis Paperno, Germán Kruszewski, Angeliki Lazaridou, Quan Ngoc Pham, Raffaella Bernardi, Sandro Pezzelle, Marco Baroni, Gemma Boleda, and Raquel Fernández. 2016. The LAMBADA dataset: Word prediction requiring a broad discourse context. arxiv:1606.06031 [cs.CL]
[14]
Emilio Parisotto, H Francis Song, Jack W Rae, Razvan Pascanu, Caglar Gulcehre, Siddhant M Jayakumar, Max Jaderberg, Raphael Lopez Kaufman, Aidan Clark, Seb Noury, 2019. Stabilizing transformers for reinforcement learning. arXiv preprint arXiv:1910.06764(2019).
[15]
Evgeny Putin, Arip Asadulaev, Yan Ivanenkov, Vladimir Aladinskiy, Benjamin Sanchez-Lengeling, Alán Aspuru-Guzik, and Alex Zhavoronkov. 2018. Reinforced adversarial neural computer for de novo molecular design. Journal of chemical information and modeling 58, 6 (2018), 1194–1204.
[16]
Evgeny Putin, Arip Asadulaev, Quentin Vanhaelen, Yan Ivanenkov, Anastasia V Aladinskaya, Alex Aliper, and Alex Zhavoronkov. 2018. Adversarial threshold neural computer for molecular de novo design. Molecular pharmaceutics 15, 10 (2018), 4386–4397.
[17]
Gideon Stein, Andrey Filchenkov, and Arip Asadulaev. 2020. Stabilizing Transformer-Based Action Sequence Generation For Q-Learning. arXiv preprint arXiv:2010.12698(2020).
[18]
Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to Fine-Tune BERT for Text Classification?. In Chinese Computational Linguistics - 18th China National Conference, CCL 2019, Kunming, China, October 18-20, 2019, Proceedings(Lecture Notes in Computer Science, Vol. 11856), Maosong Sun, Xuanjing Huang, Heng Ji, Zhiyuan Liu, and Yang Liu (Eds.). Springer, 194–206. https://doi.org/10.1007/978-3-030-32381-3_16
[19]
Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.
[20]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008.
[21]
Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu SeqGAN. 2016. Sequence Generative Adversarial Nets with Policy Gradient. arXiv e-prints, page. arXiv preprint arXiv:1609.05473(2016).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ACAI '20: Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence
December 2020
576 pages
ISBN:9781450388115
DOI:10.1145/3446132
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 March 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GPT
  2. Machine Learning
  3. Natural Language Processing
  4. Neural Networks
  5. Transformers

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Russian Ministry of Science and Higher Education by the State Task

Conference

ACAI 2020

Acceptance Rates

Overall Acceptance Rate 173 of 395 submissions, 44%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 44
    Total Downloads
  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media