1 Introduction

In the history of Chinese poetry, there were countless talents leaving many outstanding works. Poetry is the carrier of language, due to the different methods of expression and combination of words, various poetry genres and art forms are derived. Palindrome poetry is one of them.

Palindrome poetry is a unique genre of Chinese classical poetry which uses a special rhetorical device. To be specific, a palindrome poetry, by definition, is a group of sentences that read the same forward or backward. Palindrome poems have a long history since the Jin dynasty. In terms of creation technique, the palindrome poetry emphasize the artistic characteristics of poetry’s repeated chanting to expressing one’s thoughts and telling things. We illustrate an example of famous palindrome quatrains in Fig. 1. It is poems like this that have attracted numerous scholars to devote themselves to the study of palindromes. However, composing a palindrome poem is extremely difficult and poets need not only profound literary accomplishment but also proficient writing skills. It is almost impossible for ordinary people to write poems under the particular setting.

Recently, Chinese poetry generation has achieved tremendous improvement due to the prosperous progress of deep learning and the availability of generative models  [7]. Sequence-to-sequence model (Seq2seq)  [14] has become the mainstream model to generate Chinese poems. Seq2seq models adopt the Recurrent Neural Network (RNN) Encoder-Decoder model with attention mechanism  [2]. Benefiting from the powerful generative capacity of Seq2seq, the models can successfully generate Chinese poems  [22].

Fig. 1.
figure 1

An example of 7-character Chinese palindrome poetry. Every line of the poem has the structure of the A-B-C-D-C-B-A (The positions with the same color denote the corresponding positions). (Color figure online)

Most existing methods focus on the generation of Chinese poems with the writing format of traditional Tang poetries. However, there is no work investigating the generation of Chinese palindrome poetry. Different from traditional Chinese poetry generation which has a large dataset for training supervision  [8], the generation of Chinese palindrome poetry only has a few sample references. In addition, the format of Chinese palindrome poetry should not only obey the rules of traditional poetries, but also satisfy the constraint of palindrome.

To overcome these difficulties, we propose a novel beam search based algorithm, Chinese Palindrome Poetry Generation Model (CPPGM), to automatically generate Chinese palindrome poems: 1) We first obtain an input word as the middle word of the first line, then we adopt a particular beam search algorithm with a language model trained by tradition Chinese poetry lines to extend the first line. 2) Based on the first line, we adopt another beam search algorithm and two Seq2seq models with attention mechanism, the forward model and the backward model, to generate the second line. 3) After the generation of previous two lines, we repeat step 2 with the previous lines as input to generate the rest lines.

It is worth mentioning that we propose a new beam search algorithm, unlike conventional beam search Algorithm  [6], ours picks the most likely words in the corresponding positions in the generation of Chinese palindrome poems.

The main contributions of our work can be summarized as follows:

  • To the best of our knowledge, we propose the first Chinese palindrome poetry generation model based on universal models.

  • Since we train our model on the traditional poetry dataset, our method can also generate other Chinese palindromes, such as Chinese palindrome couplets.

2 Related Works

Poetry generation is a challenging task in Natural Language Processing (NLP). Although the poem is short, its meaning is profound. Thus, the topic of poem generation has attracted many researchers over the past decades. There are several various kinds of approaches to generate poems.

Originally, the traditional approaches rely on templates and rules, they employ templates to construct poems according to series of constraints, e.g., meter, stress, rhyme and word frequency, in combination with corpus-based and lexicographic resources. For instance,  [1] use patterns based on parts of speech and WordNet  [10].  [19] propose an approach to generate Haiku poem by using rules extracted from a corpus and additional lexical resources to expand user queries. In addition  [11] present a Portuguese poem generation platform which utilizes grammar and semantic templates.

The second kind of approach is based on Statistical Machine Translation (SMT).  [5] generate Chinese couplets using a phrase-based SMT approach which translates the first line to the second line. And  [4] extend this algorithm to generate Chinese quatrains by translating the current line from the previous lines.

Recently, with the rapid development of deep learning on NLP, neural networks have been widely adopted to generate poetry. Different models have been proposed to generate poetry and shown great performance.  [24] first propose to generate Chinese quatrains with Recurrent Neural Network (RNN), each generated line is vectorized by a Convolutional Sentence Model (CSM) and then packed into the history vector. To enhance coherence, their model needs to be interpolated with two extra SMT features. Given some input keywords, they use a character-based RNN language model  [9] to generate the first line, and then the other lines are generated sequentially by a variant RNN.  [17] use Long Short-term Memory (LSTM) based seq2seq model with attention mechanism to generate Song Iambics. Then,  [16] extend this system and generate Chinese quatrains successfully. To guarantee that the generated poem is semantically and coherent accordant with the users’ intents,  [18] propose a two-stage poetry generation method. They first give the sub-topics of the poem and then utilize these sub-topics to generate each line by a revised RNN encoder-decoder model.  [22] regard the poetry generation as a sequence-to-sequence learning problem. They construct three generation blocks (word-to-line, line-to-line and context-to-line) based on RNN Encoder-Decoder to generate the whole poem. More recently,  [23] saves hundreds of human-authored poems in a static external memory to improve the innovation of generated quatrains and achieve style transfer.  [20] employ Conditional Variational AutoEncoder (CVAE)  [3, 21] to generate Chinese poetry.

To some extent, our approach is related to the works of deep learning mentioned above. Furthermore, we investigate the generation of Chinese palindrome poetry.

Fig. 2.
figure 2

The overall framework of the first line generation and a cyclic step (three-character to five-character) in the inference process.

3 Methodology

In this section, we will describe our inference method concretely for Chinese palindrome poetry generation, including: 1) generating the first line with a well-trained language model and a beam search algorithm, 2) generating the rest lines with two Seq2seq models with attention mechanism and another beam search algorithm. As our method bases on probability distribution, it could smoothly tally with other architectures, such as models presented in  [15]. Considering the popularity and widespread use of RNNs, we use RNN-based models as an exemplar in this paper. The framework of our models is shown in Fig. 2 and Fig. 3.

3.1 Problem Definition

Let w denote the external input, which is then placed in the middle position of the first line. Thus, with the input w, we should fill the corresponding positions of the first line with the same character c like \(C=[c_1,c_2,...,w(c_{(n+1)/2}),...,c_{n-1},c{n}]\) (the length of line n is an odd number), and the corresponding positions mean the two characters with the subscripts which sum \(n+1\). To compose a whole poem, we have to generate other three lines with the same format as the first line.

3.2 The First Line Generation

In order to generate the first line of the poem, we train a neural language model using the first line of traditional Chinese five-character quatrains and seven-character quatrains in our dataset. Different from other Chinese poetry generation methods which input a theme or picture into the models, we input only one Chinese character W as the middle word of the first line. Our method is calculated to find set \(\hat{C}=[\hat{c_1}, \hat{c_2}, ... , \hat{c_{n-1}}, \hat{c_n}]\) to maximize the probability \(P(\hat{C})\):

$$\begin{aligned} P(\hat{C})= \mathop {\arg \max }_{C_i \in C}\sum _{t=1}^{n}logP(c_t|c_{1:t-1}), \end{aligned}$$
(1)

where the corresponding positions have the same character, i.e. \(c_t=c_{n+1-t}\), n denotes length of the sentence to be generated, t denotes the time step.

However, we can not handle it by left-to-right greedy generation or beam search, since we have to meet the rules of the palindrome. To this end, we first input the middle word \(w (c_{(n+1)/2})\) into the language model to obtain the top \(b\times b\) likely next words, which keeps the search space size consistent with the following steps. b denotes the beam search size. After obtaining the \(b\times b\) candidate next words, we compute the probabilities of each triad \([c_{(n-1)/2},w,c_{(n+3)/2}]\)(\(c_{(n-1)/2}=c_{(n+3)/2}\)) from candidate next words by the language model. Then, we pick out the top b triads with the highest probability. Since we get b triads with high probability, each triad is fed to the model to get b probable next words, so that we could obtain \(b\times b\) quintuples \([c_{(n-3)/2},c_{(n-1)/2},w,c_{(n+3)/2},c_{(n+5)/2}]\). Again we compute the probabilities of each quintuple, then we could pick out first b likely sequences. The rest can be done in the same manner to receive b optimum 5-character or 7-character line. Deserved to be mentioned we augment our data by cutting the 5-character or 7-character sentences into 3-character and 5-character phrases so that the candidate triads and quintuples could be more reasonable. The whole algorithm is further summarized in 1, and to make it easier to understand the inference method, we show the approximate steps in Fig. 2.

figure a

3.3 The Rest Lines Generation

By means of the algorithm above we have now get dozens of candidates first lines, However the sentences selected by probability may not always satisfy the language preference of human-written poetry, as a consequence we filter the generated sentences again by the BiLingual Evaluation Understudy (BLEU)  [12]. We use the first lines of human-written poems in our dataset as the references and the generated sentences as the candidates, so that we can generally pick out the fluent and reasonable sentences which will be fed as input to two seq2seq models.

For the purpose of generating the second line of the poem, we train two seq2seq models, forward and backward models. We apply the first lines of poems in our dataset as input and the second lines in forward sequence and backward sequence as the target of the forward and backward models, respectively.

Since poetry is an entirety, and every sentence is related to each other, we generate the words from left to right as the common inference method to make the utmost of the information in previous lines. However, in the view of the form of palindrome poem, the probability of a word appearing in its corresponding position should be taken into account when generating a new line. In this case, we come up with another algorithm differs from the method of the first line generation.

Our key idea is to find the set \(\hat{C}=[{c_0},\hat{c_1}, \hat{c_2}, ... , \hat{c_{(n+1)/2}}\) to maximize the probability \(P(\hat{C})=P_1(\hat{C})\times P_2(\hat{C}))\), \(P_1(\hat{C})\) and \(P_2(\hat{C})\) are calculated in the same way but with different models via:

$$\begin{aligned} P(\hat{C})= \mathop {\arg \max }_{C_i \in C}\sum _{t=1}^{n+1/2}logP(c_t|c_{1:t-1}, X), \end{aligned}$$
(2)

where the \(P_1(\hat{C})\) and \(P_2(\hat{C})\) are computed by forward model and backward model, respectively. \({c_0}\) denotes the start character \(<s>\), X denotes the first line generated before, n denotes length of the sentence to be generated, t denotes the time step.

figure b

Take 7-character poems as an example, we first input the start character \(<s>\) and the first line X in the forward model to pick up top \(b\times b\) candidates of the first position by the probability (\(P_1\)), and the word that appears in the first position is same as the word in its corresponding position, i.e. the last position. Since the backward model is trained with verses in reverse order, it could calculate the probabilities of the words in the corresponding position with the same inputs as the forward one. So we feed the same inputs into the backward model to obtain the probabilities (\(P_2\)) of the candidates appearing in their corresponding position. Thus, \(P_1\times P_2\) is the probability of the candidates appearing in both corresponding positions. Based on \(P_1\times P_2\), we could choose final top b candidates of the first position, and save their probabilities (\(P(\hat{c_0},\hat{c_1})=P_1(\hat{c_0},\hat{c_1})\times P_2(\hat{c_0},\hat{c_1}))\)). Then we input each candidate sequence \([\hat{c_0},c_1]\) and the first line X respectively in the forward model to get first b candidate next words, and their probabilities (\(P_1(\hat{c_2}|\hat{c_0},\hat{c_1},X)\)), so that we could obtain \(b\times b\) candidate sequences. Again we feed the same input to the backward model to calculate the probabilities of the candidate next words in the corresponding position (\(P_2(\hat{c_2}|\hat{c_0},\hat{c_1},X)\)), and then multiply them with their corresponding forward probabilities. In order to pick out b reasonable triads \((\hat{c_0},\hat{c_1},\hat{c_2})\), we have to multiply these probabilities with those old (\(P(\hat{c_0},\hat{c_1})\)). The rest can be done in the same rule to receive b optimum \(\hat{C}=[\hat{c_0},\hat{c_1},\hat{c_2},\hat{c_3},\hat{c_4}\). According to the symmetry of the palindrome we could easily obtain the completed lines. The integrated algorithm is specifically summarized in 2, and to make the approximate easier to understand we show the inference process in Fig. 3.

Fig. 3.
figure 3

The overall framework and inference process of the rest lines generation. We take a circular reasoning step of the second line generation as an example and the other inference steps are the same.

The generations of the third and fourth lines are almost the same as the second. To be specifically, the Seq2Seq models of the third line generation use the first two lines of the traditional poems as input and the third line in two opposite orders as the targets, respectively. The models of the last line generation use the first three lines of the traditional poems as input and the fourth line in two reverse orders as the targets separately.

4 Experiment

In this section, we first introduce the experimental settings, including datasets, data processing and data augmentation. Then, we will briefly introduce the methods that have been eliminated in the exploration process and use them as the baselines. Finally, we evaluate the results by both the qualitative metric and human perception study.

4.1 Datasets

We build two palindrome dataset:palindrome poems dataset and palindrome couplets dataset. For palindrome poems, we build a large traditional Chinese poetry corpus which contains 93,474 5-character quatrains and 139,196 7-character quatrain. The dataset we use in this task is built from all the first lines of poems in our corpus. For palindrome couplets, we gathered 478,115 couplets without any punctuation inside from over a million multiform Chinese couplets.

4.2 Baselines

Since there was no work on Chinese palindrome poetry generation before ours (CPPGM), we have to compare the proposed method with an intuitive approach (SMCP), and the original method we employed (4-3GM). SMCP simply generates the first four words in the way of universal poetry generation, then copies the first three words to their corresponding positions. For 4-3GM model, since the first four words and last three words are more like two separate parts in each line, it first generates several groups of first four words as SMCP, then uses trained language model or Seq2Seq model to calculate the probability of every candidate A-B-C-D-C-B-A structured sentence and chooses the candidates with the highest probability.

4.3 Metrics and Results

In other text generation tasks, following  [13], researchers usually compare methods on standard sentence-level metric BLEU scores  [12] which generally considers the correspondence between the ground truth and the generated sentences. However, there is no standard reference to evaluate our results, so we can only use part of sentences randomly sampled from the test set as references to evaluate fluency of the generated sentences. Besides BLEU scores, we also train two language models by poems and couplets in our dataset, respectively. Then, we can use the perplexity (PPL) to evaluate the quality of the complete sentences. We use the same models to generate poems and couplets in equal quantity by these three methods. The BLEU (the higher the better) scores and PPL (the lower the better) are shown in Table 1. According to the results, we find the score of proposed model is much better than the other two. It indicates that our method not only meets the structural requirements of the palindromes, but also makes the generated sentences more fluent and closer to those written by human beings. Since the first four words in other two methods are decoded in a regular way, the scores for these two methods are somewhat overrated in the fluency-based evaluation. However, in most cases the proposed method performs better than them. Table 2 shows some examples of palindrome poems and palindrome couplets generated by our model.

Table 1. Human evaluation results.

4.4 Human Evaluation

While the above analysis is useful as a sanity check, it offers limited intuition about how closer the generated samples to man made. The main problem is that we do not have strong-alignment data pairs to evaluate our results. Accordingly, we conduct a human evaluation to compare the perceived sample quality of the different models. Following  [4, 24], we collect generations of each of the three methods including poetry and couplets on 50 randomly-selected test instances. Then, we launch a crowd-sourcing online study, asking 20 evaluators to rate the generations by Fluency, Coherence, Poeticness, and Meaning. In addition, we add a new evaluation metric (Sprachlichkeit) to determine whether the palindrome is consistent with human language. The scores range from 1 to 5, the higher the better. The detailed description is listed below: (a) Sprachlichkeit: Is the palindrome consistent with human language? (b) Fluency: Does the palindrome read smoothly and fluently? (c) Coherence: Is the palindrome coherent across lines? (d) Poeticness: Does the palindrome follow the rhyme and tone requirements? (e) Meaning: Does the palindrome have a certain meaning and artistic conception?

The results are shown in Table 1. Due to the strict requirements of the palindrome, the scores will not be very high, but we can see that CPPGM consistently outperforms all baselines in every aspect.

Table 2. Examples of 7-character Chinese palindrome poetry and couplet by CPPGM.

5 Conclusions

In this paper, we propose a model to generate Chinese palindrome poetry. To the best of our knowledge, the method is the first to generate Chinese palindrome poetry and other palindromes such as palindrome couplet based on universal models. We compare the method with several baselines using machine evaluation as well as human judgment. The experimental results prove that the proposed method is an effective and efficient approach for Chinese palindromes generation. Since we only used the basic language model and seq2seq model in the experiment, we can improve the generation effect by improving the models in the future.