Teaching machines to write like humans using L-attributed grammar

https://doi.org/10.1016/j.engappai.2020.103489Get rights and content

Abstract

Reading and writing are easy for humans. The automatic reading of handwritten characters has been studied for several decades. Machine learning algorithms for reading tasks often require a huge amount of data to perform with similar accuracy to humans, yet it is also difficult to gain sufficient meaningful data. Automatic writing tasks have not been studied as extensively. In this paper, we teach machines to write like teaching a child by telling the machine the method for writing each character using L-attributed grammar. With the aid of the proposed TMTW (Teaching Machines To Write) interacting system, a human as a teacher only needs to provide the writing sequence of parts and control lines. The proposed system automatically perceives the relationships between control lines and parts, and constructs the grammars. Top-down derivation and the stroke generation method are applied to generate varying characters based on the learned grammars. For as long as a machine can write, it can be applied in robot control or training sample generation for automatic reading tasks. The MNIST and CASIA datasets are used to demonstrate the effectiveness of the proposed system on different languages. The machine written samples are used to train a network, which is evaluated on the MNIST test set. A test error rate of 1.23% is achieved using only approximately 20 grammars on average for each digit. Using the generated and handwritten samples together as a training set can reduce the test error rate to 0.61%. Similar experiments are conducted using the CASIA data set, and the results demonstrated that the proposed method is effective in generating characters with a complex structure. The source codes and grammars used in this paper have been made publicly available in https://github.com/step123456789/TMTW.

Introduction

Writing is an easy but important skill for humans. With the rapid development of anthropomorphic robots, automatic machine writing is important and practically useful. Meanwhile, existing machine learning systems typically require a huge amount of handwritten data in order to realize automatic reading tasks. Automatic writing machines can supply sufficient data in several hours; however, automatic writing has not been studied as extensively as automatic recognition (Suen et al., 1980, Plamondon and Srihari, 2000, Liu et al., 2004, Zhang et al., 2017a). The study of generative models is an important research topic in the machine learning field (Lecun et al., 2015). Numerous generative models have been proposed, such as GAN (Goodfellow et al., 2014), DRAW (Gregor et al., 2015), LAPGAN (Denton et al., 2015), DCGAN (Radford et al., 2015), CycleGAN (Zhu et al., 2018), and SAGANs (Zhang et al., 2018a). It was demonstrated by Gene (2015) that realistic-looking characters can be generated using DCGAN. In order to generate the online handwriting trajectory automatically, the RNN with LSTM was proven to be highly effective for English handwriting generation (Graves, 2013). Zhang et al. (2018b) extended the framework in Graves (2013) to learn a conditional generative model for automatic drawing of handwritten Chinese characters. Their generative model was trained using the CASIA database (Liu et al., 2013)1 including OLHWDB1.0 and OLHWDB1.1, which include more than 2 million training samples. Lake et al. (2015) proposed a Bayesian program learning approach for learning and representing concepts from existing online handwriting data, and reused these concepts to construct new samples.

Although these methods have exhibited superior performance in previous works, they still do not represent the manner in which a child learns to write. As demonstrated by Zhang et al. (2018b), generative RNN models are not capable of capturing small but important details for accurate drawing. For example, Fig. 1 illustrates several confusing Chinese characters. There are more than 10,000 confusing pairs in the Chinese language. Human beings discriminate these confusing pairs by means of rules. Taking the confusing pair (b) in Fig. 1 as an example, we need to determine whether or not the bottom horizontal stroke is longer than the top horizontal stroke: if it is longer, it means soil or land; otherwise, it means scholar or knight. Rules are used not only to discriminate confusing characters, but also to construct the complex structures of Chinese characters. Taking the first character in confusing pair (a) as an example: this character means live or alive and can be constructed by a water radical at the left of a tongue radical. The water radical may be further interpreted as three water droplets that are vertically distributed. In this paper, we teach a machine to write in the same manner as teaching a child by using rules (grammars) instead of large amounts of handwritten samples. The advantages of the proposed method include:

(1) The proposed method simulates the process of teaching children to write. Most of the Chinese characters are complex and have many strokes. The process of learning to write for children is the process of writing strokes or parts in order under the guidance of a teacher. Children can often learn the relationships between strokes or parts by themselves. In accordance with this process, teacher gives the writing order of each stroke or part through the interacting system then the proposed method learns the relationships between strokes or parts automatically.

(2) The proposed method can generate more accurate details. From the experimental results of this paper, we can see that both the CycleGAN based method and the RNN based method have the problem of inaccurate character image generation, such as the problem of missing strokes, adding extra strokes and stroke splitting. These problems are not allowed to occur in the process of teaching our children to read and write. This advantage is especially important for writing very complex characters.

(3) The proposed method does not need to collect thousands of handwritten samples for each character, it only needs to create several grammars for each character. Most of the deep learning based generation methods rely heavily on training samples. The proposed method tries another way. Moreover, the proposed method can be directly applied to generate other symbols. For example, some machine drawing flowers which are shown in Fig. 2 are generated by the proposed method with only one simple grammar.

Bézier curves are frequently used to model curves in computer graphics. TrueType fonts and most imaging tools use quadratic or cubic Bézier curves for drawing characters or curved shapes. The Bézier curve is completely contained within the convex hull of its control points. Therefore, control points can be used to manipulate the curve, and affine transformations, such as translation and rotation, can be applied to the curve by employing the respective transform on the curve control points. Fig. 3 illustrates a cubic Bézier curve with four control points. Complex shapes can be represented by a group of Bézier curves. In this paper, we use Bézier curves to draw characters or shapes. In order to maintain consistency with character strokes, we use the line segment with two control points as the starting and end points to control character strokes. We refer to this as the control line throughout the paper. The control lines of each curve are generated by an attribute grammar. Each grammar corresponds to a language, and each handwritten character can be viewed as a sentence of that language. The grammars of each character are generated semi-automatically. The writing sequence of parts or control lines is provided by the teacher very efficiently using the proposed TMTW system. The writing order of parts and control lines in a character cannot be interpreted automatically from font types because it is not writing randomly, but with varying styles. This is all that the teacher is required to do. The TMTW system perceives the relationships between the parts and control lines, and generates the grammar automatically based on the writing sequence. Thereafter, the machine can write varying samples based on the learned grammar.

The remainder of this paper is organized as follows. Section 2 describes the character or shape representation method based on L-attributed grammars. Section 3 provides the implementation details of the TMTW system. Section 4 reports on the experimental results. Finally, concluding remarks are provided in Section 5.

Section snippets

Constraint using the writing history

When writing, people mainly pay attention to the relationship between the current stroke and the strokes already written, so as to constrain the shape, position and size of the current stroke. People may also deliberately reserve some space for future strokes or radicals, but if we limit the proportions between the strokes or radicals, then even if you do not consider future strokes while writing, you can make correct writing. Because if the space left is too small, then you cannot write the

Grammar construction

In this study, grammar is constructed semi-automatically. We divide the grammar construction process into two steps, which will be explained in the following paragraphs.

Datasets and parameter setup

The handwritten digit dataset MNIST (Lecun et al., 1998) and handwritten Chinese character dataset CASIA (Liu et al., 2013) are used to evaluate the proposed method. The MNIST dataset consists of scans of handwritten digits and the associated labels for each image. This dataset contains 60 000 training samples and 10 000 test samples. This simple classification problem is one of the simplest and most widely used tests in machine learning research. The CASIA dataset consists of approximately 240

Conclusions

In this paper, we have proposed an attributed grammar-based method to teach machines to write like humans. Grammars can easily be constructed based on the proposed TMTW system, and we can randomly generate varying partitioned characters based on these grammars. Automatic reading tasks were used to evaluate the proposed method. A comparable test error rate was achieved on the MNIST and CASIA datasets using only approximately 20 grammars on average for each character. The generated samples can

CRediT authorship contribution statement

Yunxue Shao: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing - original draft, Writing - review & editing, Visualization, Funding acquisition. Cheng-Lin Liu: Conceptualization, Methodology, Validation, Formal analysis, Investigation, Data curation, Writing - original draft, Funding acquisition.

Acknowledgment

This study was supported by the National Natural Science Foundation of China (NSFC) under Grant no. 61563039.

References (19)

There are more references available in the full text version of this article.

No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.engappai.2020.103489.

View full text