research-article

Transgenerators

Authors:

Arip Asadulaev,

Andrey FilchenkovAuthors Info & Claims

ACAI '20: Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence

Article No.: 90, Pages 1 - 5

https://doi.org/10.1145/3446132.3446417

Published: 09 March 2021 Publication History

Abstract

Pre-trained Transformers(GPT) are showed great performance in natural language generation task. This model was trained in a self-supervised manner on a large amount of text data crawled from the WEB. Such a dataset has not the highest quality, many sentences are prone to errors such as typos or grammar mistakes. As a result, text generated by GPTs consists of a lot of grammar incorrect sentences. While Transformers is also showed great performance in translation tasks, we propose the conception when a model can handle a generation and a translation task at the same time. But we propose a specific type of translation, in our method Transformer is training to translate a sentence with grammar errors to the same sentences without errors. In the full case, an incorrectly generated sentence can be corrected by the extended version of the same model, we call this type of model Transgenerator. We applied several experiments to estimate a generative power of Transgenerator based on GPT-2 architecture and the proposed method outperformed original GPT-2 model on the range of tasks

References

[1]

2015. grammar-check. https://github.com/viraja1/grammar-check.

[2]

2018. Improving language understanding by generative pre-training.

[3]

Jong Alec Radford, Jeff Wook Kim, and Wu. 2019. Gpt-2-output-dataset. https://github.com/openai/gpt-2-output-dataset.

[4]

Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. arXiv preprint arXiv:2005.14165(2020).

[5]

Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V Le, and Ruslan Salakhutdinov. 2019. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860(2019).

[6]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018).

[7]

Samuel Dupond. 2019. A thorough review on the current advance of neural network structures. Annual Reviews in Control 14 (2019), 200–230.

[8]

William Fedus, Ian Goodfellow, and Andrew M Dai. 2018. MaskGAN: Better text generation via filling in the_. arXiv preprint arXiv:1801.07736(2018).

[9]

Yoav Goldberg. 2016. A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research 57 (2016), 345–420.

Digital Library

[10]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672–2680.

[11]

Alex M Lamb, Anirudh Goyal ALIAS PARTH GOYAL, Ying Zhang, Saizheng Zhang, Aaron C Courville, and Yoshua Bengio. 2016. Professor forcing: A new algorithm for training recurrent networks. Advances in neural information processing systems 29 (2016), 4601–4609.

[12]

Jiwei Li, Will Monroe, Tianlin Shi, Sébastien Jean, Alan Ritter, and Dan Jurafsky. 2017. Adversarial learning for neural dialogue generation. arXiv preprint arXiv:1701.06547(2017).

[13]

Denis Paperno, Germán Kruszewski, Angeliki Lazaridou, Quan Ngoc Pham, Raffaella Bernardi, Sandro Pezzelle, Marco Baroni, Gemma Boleda, and Raquel Fernández. 2016. The LAMBADA dataset: Word prediction requiring a broad discourse context. arxiv:1606.06031 [cs.CL]

[14]

Emilio Parisotto, H Francis Song, Jack W Rae, Razvan Pascanu, Caglar Gulcehre, Siddhant M Jayakumar, Max Jaderberg, Raphael Lopez Kaufman, Aidan Clark, Seb Noury, 2019. Stabilizing transformers for reinforcement learning. arXiv preprint arXiv:1910.06764(2019).

[15]

Evgeny Putin, Arip Asadulaev, Yan Ivanenkov, Vladimir Aladinskiy, Benjamin Sanchez-Lengeling, Alán Aspuru-Guzik, and Alex Zhavoronkov. 2018. Reinforced adversarial neural computer for de novo molecular design. Journal of chemical information and modeling 58, 6 (2018), 1194–1204.

[16]

Evgeny Putin, Arip Asadulaev, Quentin Vanhaelen, Yan Ivanenkov, Anastasia V Aladinskaya, Alex Aliper, and Alex Zhavoronkov. 2018. Adversarial threshold neural computer for molecular de novo design. Molecular pharmaceutics 15, 10 (2018), 4386–4397.

[17]

Gideon Stein, Andrey Filchenkov, and Arip Asadulaev. 2020. Stabilizing Transformer-Based Action Sequence Generation For Q-Learning. arXiv preprint arXiv:2010.12698(2020).

[18]

Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to Fine-Tune BERT for Text Classification?. In Chinese Computational Linguistics - 18th China National Conference, CCL 2019, Kunming, China, October 18-20, 2019, Proceedings(Lecture Notes in Computer Science, Vol. 11856), Maosong Sun, Xuanjing Huang, Heng Ji, Zhiyuan Liu, and Yang Liu (Eds.). Springer, 194–206. https://doi.org/10.1007/978-3-030-32381-3_16

Digital Library

[19]

Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.

Digital Library

[20]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008.

[21]

Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu SeqGAN. 2016. Sequence Generative Adversarial Nets with Policy Gradient. arXiv e-prints, page. arXiv preprint arXiv:1609.05473(2016).

Recommendations

How Good are Transformers in Reordering?
Multi-disciplinary Trends in Artificial Intelligence
Abstract
Translation requires transfer of lexical items (words/phrases) from Source Language to Target Language and also reordering of the transferred lexical items as appropriate for the target language. Whatever be the approach used, quality of ...
Broad-Coverage Parsing with Neural Networks

Subsymbolic systems have been successfully used to model several aspects of human language processing. Such parsers are appealing because they allow revising the interpretation as words are incrementally processed. Yet, it has been very hard to scale ...
Grammar-based classifier system: a universal tool for grammatical inference

Grammatical Inference deals with the problem of learning structural models, such as grammars, from different sort of data patterns, such as artificial languages, natural languages, biosequences, speech and so on. This article describes a new grammatical ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ACAI '20: Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence

December 2020

576 pages

ISBN:9781450388115

DOI:10.1145/3446132

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 March 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Russian Ministry of Science and Higher Education by the State Task

Conference

ACAI 2020

ACAI 2020: 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence

December 24 - 26, 2020

Sanya, China

Acceptance Rates

Overall Acceptance Rate 173 of 395 submissions, 44%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
44
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents