Adversarial text generation with context adapted global knowledge and a self-attentive discriminator

doi:10.1016/j.ipm.2020.102217

Information Processing & Management

Volume 57, Issue 6, November 2020, 102217

https://doi.org/10.1016/j.ipm.2020.102217 Get rights and content

Highlights

•
A word sequence-based adversarial network that exploits the semantics of the corpus by adapting global word embeddings to the context of analysis.
•
Self-attentive discriminator to map the semantics of the generated text with real-world text.
•
Evaluation framework based on quantitative and qualitative analyses.
•
A word sequence-based adversarial network that balances both generator and discriminator towards reaching the Nash equilibrium.

Abstract

Text generation is a challenging task for intelligent agents. Numerous research attempts have investigated the use of adversarial networks with word sequence-based generators. However, these approaches suffer from an unbalance between generator and discriminator causing overfitting due to the strength that the discriminator acquires by getting too precise in distinguishing what the generator is producing and what instead comes from the real dataset. In this paper, we investigate how to balance both generator and discriminator of a sequence-based text adversarial network exploiting: i) the contribution of global knowledge in the input of the adversarial network encoded by global word embeddings that are adapted to the context of the datasets in which they are utilized, and ii) the use of a self-attentive discriminator that slowly minimizes its loss function and thus enables the generator to get valuable feedback during the training process. Through an extensive evaluation on three datasets of short-, medium- and long-length text documents, the results computed using word-overlapping metrics show that our model outperforms four baselines. We also discuss the results of our model using readability metrics and the human perceived quality of the generated documents.

Introduction

Text generation is the computational task of producing text in natural language. Over the last few years, it has become popular thanks to the many applications in which it can be used such as dialogue generation, text summarization and neural machine translation.

Text is a means of communication consisting of the use of words in structured and conventional ways (syntax) to deliver multifarious meanings (semantics). Generating text autonomously is an outstanding challenge in today’s intelligent agents, whose results are far from human-like level of the generated text. Numerous approaches address the task by building language models that aim to represent the probability distribution of words in a text used as a reference in order to predict the next word given those that precede it, thus casting the text generation task as a sequence prediction problem. The current state of the art of language models are developed using neural networks. Recurrent neural networks (RNNs) are frequently used learning methodologies that maximize the likelihood to predict the observed data (Karafiàt, Burget, C̆ernocký, & Khudanpur, 2010). A major limitation of RNNs in addressing this task is the exposure bias (Bengio, Vinyals, Jaitly, & Shazeer, 2015). During the training process, the network processes training data, but during the inference stage it uses its own predictions as input. As a result when the network makes a prediction mistake the error is accumulated and propagated throughout the sequence, causing the model to being unable to work properly. In Bengio et al. (2015), authors proposed the Scheduled Sampling technique, which consists of randomly alternating the input during the training process by feeding real data or its own predictions, but, in Huszár (2015), authors demonstrated that this method is inconsistent. Such an attempt investigates the impact of real data utilized during the inference stage, a methodology that is later exploited by generative adversarial networks successfully.

The complexity of language is manifold since there are several aspects to take into account, from syntax, to semantics, to how to express a coherent meaning within a sentence, diversity, and improvisation. Syntax structures can be learned by statistical models, for instance how to structure a sentence with a subject, a predicate and an object.

Readability and discourse coherence are two challenges addressed by automated text generation approaches. Even though numerous advancements have been made in this field, the approaches discussed so far (Bosselut et al. (2018), Shi, Chen, Qiu, and Huang (2018)), still have limitations that are inherited from the machine learning field: an inability to make agents really understanding but rather replicate an observed pattern of a given data series. As observed in Li, Galley, Brockett, Gao, and Dolan (2016), this outputs a strong ability to reproduce a syntax structure, but it presents weaknesses in generating articulated semantic structures and in text diversity.

The introduction of Generative Adversarial Networks (GANs) (Goodfellow et al., 2014) paved the way to how generative tasks can be addressed. Compared to previous computational attempts that are based on having an explicit likelihood function, GANs’ likelihood function is implied by having two components with two different objective functions (Danihelka, Lakshminarayanan, Uria, Wierstra, & Dayan, 2017). For this reason these networks have fundamental characteristics to alleviate the exposure bias problem and to provide further diversity in the generated outputs. These are investigated in this work.

GANs were designed to deal with continuous data such as image characteristics, while natural language is based on discrete tokens and therefore a different implementation is required. SeqGAN (Yu, Zhang, Wang, & Yu, 2017) extends GANs to be applied for a discrete sequence generation task by considering the problem of predicting the next word as a reinforcement learning problem where i) the initial state is the current sequence at a given timestep, ii) the action is to select the next word and iii) the reward is computed according to the distance between the expected word and the predicted one. However, these approaches suffer from an unbalance between generator and discriminator causing overfitting due to the strength that the discriminator acquires by getting too precise in distinguishing what the generator is producing and what instead comes from the real dataset. In this paper, we investigate the contribution of contextual information to SeqGAN as inputs encoded by global word embeddings built from common knowledge that are adapted to the context of the datasets in which they are utilized. We also experiment with a sequence-based self-attentive neural network as a discriminator and experimentally assessed whether the model is able to encode knowledge into the generated text while contributing to stabilize the loss function in the adversarial training. In fact, a self-attentive network slowly minimizes its loss function while exploiting the semantics of text and thus enabling the generator to get valuable feedback during the training process. We investigate the following two research questions:

RQ1:
what is the impact of utilizing supervised context adaptation of global word embeddings in a SeqGAN model for text generation?
RQ2:
which is the most suitable discriminator in a SeqGAN model with context adaptation for text generation?

We perform an experiment based on three datasets of short-, medium- and long-length² text documents and we propose a quantitative analysis of the performance of our approach on a set of n-gram matching metrics that measure word and part-of-speech overlaps, and readability metrics. We also discuss the human perceived quality of the generated sentences.

The remainder of this paper is organized as follows: Section 2 presents a background on probabilistic language modeling for text generation and, then, reports state-of-the-art approaches in this research field; Section 4 illustrates the use of word embeddings by exploiting contextual knowledge; Section 5 introduces a sequence-based self-attentive discriminator that contributes to give valuable feedback to the generator; Section 6 reports the algorithm utilized for training our sequence-based adversarial network; Section 7 reports the datasets and metrics utilized in our experimentation; Section 8 illustrates the quantitative and qualitative results, and we conclude the paper with a discussion in Section 9.

Section snippets

Background and related work

Language modelling is considered as the core of many natural language processing tasks that require to approximate the probability distribution of an observed dataset of textual documents. We can distinguish two main approaches for language modeling: purely statistical-based (Charniak, 1996, Manning, Schütze, 1999) and, recently proposed, based on neural networks, the latter being studied in this work. Statistical-based language models learn the probability distribution of n-grams, i.e. n words

Task description and contribution overview

The task of text generation consists in the ability of generating new text automatically, with or without constraints. One possible mathematical formulation of this task is as follows. Let I be the set of all valid textual documents on a specific language, $\bar{I}$ the set of all possible documents on the same language and S an origin space, a text generation model is a function: $F : S \to \bar{I} where I \subset \bar{I}$

We assume that data is a subset of the entire set of possible documents {d₁, ..., d_i} of I given as known

Global knowledge with context adaptation

Natural language text is a representation of individual concepts such as named entities being interlinked with verbs that describe the narration with a particular context adaptation that is necessary for storytelling. The task of text generation is affected by a priori knowledge of both a context in which the narration should be compiled, and actors and events being narrated; it is also addressed as a language model that is dependent on the context. Context is described as a variable that

Self-attentive discriminator for a sequence-based generative process

A generative process often degenerates because of different distributions between real data compared with generated data as it has been demonstrated in Sønderby, Caballero, Theis, Shi, and Huszár (2016). This is exacerbated when the generated data represents long sequences of words. It happens that if the distributions diverge, the Nash equilibrium is compromised making the feedback to the generator inconclusive being back propagated a zero loss-based reward. Therefore no meaningful information

Training procedure

Our generative sequence-based model with context adaptation and self-attention that we refer to it as SeqGAN_CS has two main components: a generator G with parameters θ and a discriminator D with parameters ϕ.

The training process is structured in two main steps: i) pre-training of both G and D, and ii) training of the adversarial network. i) We do pre-training to ensure a reasonable good quality in the generated text produced by the generator, and so for the discriminator in order to be able,

Experimental setup

We conducted our experiments by identifying first the datasets, metrics and baselines.

Results

We report the results of our SeqGAN_CS model in generating short-, medium-, and long-length documents computed using both word overlapping and readability metrics. We compared these results with the ones obtained from the baselines. We opt to report the figures of word and part-of-speech overlaps @5 because we observed that it is particularly challenging for state-of-the-art models to produce five words in a row having a concise and meaningful semantics altogether, and because a 5-gram offers to

Conclusion and future work

In this paper, we studied an adversarial neural network to generate textual documents. The approach extends current state-of-the-art adversarial network approaches for text generation proposing two key contributions: i) context adaptation of global word embeddings (RQ1) and ii) a self-attentive discriminator (RQ2). We experimented with three benchmark corpora of short-, medium-, and long-length text. We utilized quantitative and qualitative assessments for measuring the quality of the generated

CRediT authorship contribution statement

Giuseppe Rizzo: Conceptualization, Investigation, Methodology, Project administration, Software, Supervision, Validation, Visualization, Writing - original draft, Writing - review & editing. Thai Hao Marco Van: Software, Validation, Visualization.

Acknowledgment

The authors acknowledge the funding received from the European Unions Horizon 2020 Research and Innovation Programme under grant agreement No 870980 “Enabling immigrants to easily know and exercise their rights”, Call: H2020-SC6-MIGRATION-2018-2019-2020, Topic: DT-MIGRATION-06-2018-2019.

References (58)

E. Arisoy et al.
Deep neural network language models
NAACL-HLT 2012 workshop: Will we ever really replace the n-gram model? On the future of language modeling for HLT
(2012)
S. Aurora et al.
A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks
ICLR. 7th International Conference on Learning Representations
(2019)
S. Bengio et al.
Scheduled sampling for sequence prediction with recurrent neural networks
28th international conference on neural information processing systems - volume 1
(2015)
Y. Bengio et al.
A neural probabilistic language model
Journal of Machine Learning Research
(2003)
J. Bergstra et al.
Random search for hyper-parameter optimization
Journal of Machine Learning Research
(2012)
J. Bogert
In defense of the fog index
The Bulletin of the Association for Business Communication
(1985)
P. Bojanowski et al.
Enriching word vectors with subword information
Transactions of the Association for Computational Linguistics
(2017)
A. Bosselut et al.
Discourse-aware neural rewards for coherent text generation
Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: Human language technologies, volume 1 (long papers)
(2018)
P.F. Brown et al.
Class-based n-gram models of natural language.
Computational Linguistics
(1992)
M. Caccia et al.
Language GANs Falling Short
ICLR. 8th International Conference on Learning Representation
(2020)

E. Charniak

Statistical language learning

(1996)

T. Che et al.

Maximum-likelihood augmented discrete generative adversarial networks

CoRR

(2017)

Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on...

M. Coleman et al.

A computer readability formula designed for machine scoring.

Journal of Applied Psychology

(1975)

Danihelka, I., Lakshminarayanan, B., Uria, B., Wierstra, D., & Dayan, P. (2017). Comparison of maximum likelihood and...

T. van Dijk

Contextual knowledge management in discourse production

(2005)

W. Fedus et al.

Maskgan: Better text generation via filling in the ____

Proceedings of the sixth international conference on learning representations

(2018)

I.J. Goodfellow et al.

Generative adversarial nets

In 27th international conference on neural information processing systems - volume 2

(2014)

A. Goyal et al.

Professor forcing: A new algorithm for training recurrent networks

Advances in neural information processing systems 29 (NIPS 2016)

(2016)

J. Guo et al.

Long text generation via adversarial training with leaked information

CoRR

(2017)

S. Hochreiter et al.

Long short-term memory

Neural Comput.

(1997)

Huszár, F. (2015). How (not) to train your generative model: Scheduled sampling, likelihood,...

Jang, E., Gu, S., & Poole, B. (2016). Categorical reparameterization with...

P. Kameswara Sarma et al.

Domain adapted word embeddings for improved sentiment classification

Proceedings of the workshop on deep learning approaches for low-resource nlp

(2018)

M. Karafiàt et al.

Recurrent neural network based language model

11th annual conference of the international speech communication association

(2010)

D.P. Kingma et al.

Adam: A method for stochastic optimization

3rd international conference for learning representations

(2014)

R. Kneser et al.

Improved backing-off for M-gram language modeling

1995 international conference on acoustics, speech, and signal processing

(1995)

Kusner, M. J., & Hernndez-Lobato, J. M. (2016). Gans for sequences of discrete elements with the gumbel-softmax...

Y. Lecun et al.

Gradient-based learning applied to document recognition

Proceedings of the IEEE

(1998)

Cited by (0)

¹: Work done while doing the internship in LINKS Foundation.

View full text

Information Processing & Management

Adversarial text generation with context adapted global knowledge and a self-attentive discriminator

Highlights

Abstract

Introduction

Section snippets

Background and related work

Task description and contribution overview

Global knowledge with context adaptation

Self-attentive discriminator for a sequence-based generative process

Training procedure

Experimental setup

Results

Conclusion and future work

CRediT authorship contribution statement

Acknowledgment

Deep neural network language models

NAACL-HLT 2012 workshop: Will we ever really replace the n-gram model? On the future of language modeling for HLT

A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks

ICLR. 7th International Conference on Learning Representations

Scheduled sampling for sequence prediction with recurrent neural networks

28th international conference on neural information processing systems - volume 1

A neural probabilistic language model

Journal of Machine Learning Research

Random search for hyper-parameter optimization

Journal of Machine Learning Research

In defense of the fog index

The Bulletin of the Association for Business Communication

Enriching word vectors with subword information

Transactions of the Association for Computational Linguistics

Discourse-aware neural rewards for coherent text generation

Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: Human language technologies, volume 1 (long papers)

Class-based n-gram models of natural language.

Computational Linguistics

Language GANs Falling Short

ICLR. 8th International Conference on Learning Representation

Statistical language learning

Maximum-likelihood augmented discrete generative adversarial networks

CoRR

A computer readability formula designed for machine scoring.

Journal of Applied Psychology

Contextual knowledge management in discourse production

Maskgan: Better text generation via filling in the ____

Proceedings of the sixth international conference on learning representations

Generative adversarial nets

In 27th international conference on neural information processing systems - volume 2

Professor forcing: A new algorithm for training recurrent networks

Advances in neural information processing systems 29 (NIPS 2016)

Long text generation via adversarial training with leaked information

CoRR

Long short-term memory

Neural Comput.

Domain adapted word embeddings for improved sentiment classification

Proceedings of the workshop on deep learning approaches for low-resource nlp

Recurrent neural network based language model

11th annual conference of the international speech communication association

Adam: A method for stochastic optimization

3rd international conference for learning representations

Improved backing-off for M-gram language modeling

1995 international conference on acoustics, speech, and signal processing

Gradient-based learning applied to document recognition

Proceedings of the IEEE