Auto-painter: Cartoon image generation from sketch by using conditional Wasserstein generative adversarial networks
Introduction
Human beings possess a great cognitive capability of comprehend black-and-white cartoon sketches. Our mind can create realistic colorful images from abstract black-and-white cartoons. However, one may need a great artistic talent to choose appropriate colors, with proper changes in light and shade for creating visually compatible cartoon images - it is not easy job for untrained people like us. If we can automatically paint the sketches into colorful cartoon images, it could be a very useful application to help artists in cartoon industry or other digital entertainment industries. In this work, we are working on solving this problem by employing deep neural networks to transfer line draft black-and-white sketches into a specific cartoon style. Practically, the new model can make up ordinary people’s artistic talent and even inspire artists to create cartoons with different styles. Ideally, one can has the freedom to generate different styles of cartoons based on his or her own tastes of colors.
Cartoon image generation from sketches can be regarded as an image synthesis problem. Previously, many non-parametric models were proposed [1], [2], [3] by matching the sketch to a database of existing image fragments. Recently, numerous image synthesis methods based on deep neural networks have emerged [4], [5], [6], [7]. These methods can generate images with details such as faces, bedrooms, chairs and handwritten numbers. As the photo-realistic images are full of sharp details, the results may suffer from being blurry [8], noisy [6] and objects being wobbly [9]. What’s more, the outputs of the network are hard to be controlled because the generator samples from a random vector in low dimension and the model has too much flexibility. Several recent approaches have explored the applicability of controllable method for image synthesis in different applications, for instance, the super resolution problem [10], [11], [12], [13], semantic object labeling [14], image manipulation [15], image de-raining [16], grayscale image colorization [17] and other image-to-image transformation [18], [19], and achieved convincing results. Especially for sketch-to-image problem [20], control signals are relatively sparse, more ill-posed than colorization problem based on grayscale, it requires a model to synthesize image details beyond what is contained in the input. The network should learn low-level texture information as well as high-level image styles. The style of cartoon can be reflected from artistic color collocations (e.g. one style could be green hair with purple eyes, the other style may be black hair with black eyes) that may require more constraints in modeling. We investigate how to use Generative Adversarial Networks (GANs) in conditional setting for image generation. Constraints including total variance loss, pixel loss and feature loss are used in training the generator in order to generate more color collocations of different styles. The TV regularization is first used by [21] to encourage spatial smoothness. We also introduce the color control that allows users paint with their favorite colors. Fig. 1 shows an example of generated cartoon images from a sketch, the results of auto-painter (with and without color control) are compared to the ground truth in the middle.
The main contributions of this work can be summarized as follows.
- •
We propose a learning model called auto-painter that can automatically generate vivid and high resolute painted cartoon images from a sketch by using conditional Generative Adversarial Networks (cGANs). In the proposed model, we combine traditional loss and adversarial loss to generate more compatible colors.
- •
The Wasserstein distance loss is used in our cGAN based model, and empirical results show that the Wasserstein GANs [22] can stable the cGANs training and obtain better results comparing to other models.
- •
Our work was one of earliest works of using GANs for cartoon generation and the initial results were put online at arXiv.1 We also designed a demo2 with user interface for testing and all code are available to public for test.3
Section snippets
Generative adversarial networks
Generative adversarial networks (GANs) are recently regarded as a breakthrough in machine learning [6], [7], it consists of two ‘adversarial’ modules: a generative module G that captures the data distribution, and a discriminative module D that estimates the probability that a sample comes from the training data rather than G. Both G and D could be deep neural networks. In image synthesis with GANs, the generator attempts to produce a realistic image from an input random vector to fool the
Auto-painter
Auto-painter is a supervised learning model based on conditional GANs, given a black-and-white sketch, the model can generate a painted colorful image based on given sketch-image pairs in the training data. We employ a feed-forward deep neural network as generator to get a quick response in test. The generator takes the sketch as input and outputs a colorful cartoon image of the same resolution at pixel level.
Datasets
In order to train the auto-painter model, we collect a large number of cartoon pictures from the Internet with a crawler. Most previous research (e.g. [17], [20]) studies images of low resolution. As most cartoons are in high resolution in the real-world, the resolution of our training images are all at 512 × 512. In order not to change the ratio of the cartoons, we first random scale the resolution to make the shortest side longer than 512. Then, we crop the picture from bottom, middle and the
Interactive colorization
Given an input image of resolution 512 × 512, the auto-painter can generate a painted colorful output image within 1 s, which enables instant feedback to design an interactive image editing tool. We propose two approaches to allow user interaction with auto-painter.
(1) Sketch Modification: auto-painter trained on the Minions data provides a tool for users to design virtual images of ‘minions style’. As shown in the left-hand side of Fig. 9, based on the given initial sketch, one can modify the
Conclusions
In this paper, we proposed the auto-painter model to solve the sketch-to-image problem. Our approach was based on conditional GAN with Wasserstein distance. The ‘U-net’ structure is used to allow the output image has both low level sketch information as well as learned high-level color information. We put more constraints based on the pix2pix model to obtain better painting performance. We also trained the auto-painter to adapt to color control, so that the network can adapt the synthesis
Acknowledgment
We are grateful to our colleagues Naihan Li, Wei Zhao in Samsung for insightful discussions. We also thank Yuzeng Kou from Beihang University for building the online demo. Tao Wan is partly funded by the National Science Foundation of China under award No. 61401012.
Yifan Liu received her B.S. degree from Beihang University, China in 2016. She is currently a postgraduate at School of Automation Science and Electrical Engineering, Beihang University, China. Her research interests include image processing, neural language processing and deep learning.
References (39)
- et al.
Context encoders: feature learning by inpainting
Proceedings of the CVPR
(2016) - et al.
Scene completion using millions of photographs
Commun. ACM
(2007) - et al.
Patchmatch:a randomized correspondence algorithm for structural image editing
ACM Trans. Gr.
(2009) - et al.
Example-based super-resolution
IEEE Comput. Gr. Appl.
(2002) - et al.
Deep Boltzmann machines
J. Mach. Learn. Res.
(2009) - et al.
Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations
Proceedings of the ICML
(2009) - et al.
Generative adversarial nets
Adv. Neural Inf. Process. Syst.
(2014) - et al.
Unsupervised representation learning with deep convolutional generative adversarial networks
CoRR
(2015) - et al.
Auto-encoding variational Bayes
Statistics
(2014) - et al.
Deep generative image models using a Laplacian pyramid of adversarial networks
Proceedings of the NIPS
(2015)
Perceptual losses for real-time style transfer and super-resolution
Proceedings of the ECCV
Learning a deep convolutional network for image super-resolution
Proceedings of the ECCV
Photo-realistic single image super-resolution using a generative adversarial network
Proceedings of the CVPR
nImage super-resolution using a dilated convolutional neural network
Neurocomputing
Unsupervised image-to-image translation with generative adversarial networks
CoRR
Tag disentangled generative adversarial network for object image re-rendering
Proceedings of the IJCAI
Image de-raining using a conditional generative adversarial network
CoRR
Colorful image colorization
Proceedings of the ECCV
Image-to-image translation with conditional adversarial networks
Proceedings of the CVPR
Cited by (127)
Using Wasserstein Generative Adversarial Networks for the design of Monte Carlo simulations
2024, Journal of EconometricsCitation Excerpt :More importantly, we wish to generate potential treated and control outcomes given a common set of pre-treatment variables. For that reason it is important to generate data from a conditional distribution (Mirza and Osindero, 2014; Odena et al., 2017; Liu et al., 2018; Kocaoglu et al., 2017). In this section we discuss the application of WGANs for Monte Carlo studies based on the Lalonde–Dehejia–Wahba (LDW) data.
Coloring anime line art videos with transformation region enhancement network
2023, Pattern RecognitionVideo colorization with semantic preservation and consistency
2023, Computers and Electrical EngineeringSelf-Driven Dual-Path Learning for Reference-Based Line Art Colorization under Limited Data
2024, IEEE Transactions on Circuits and Systems for Video TechnologyCogeneration of Innovative Audio-visual Content: A New Challenge for Computing Art
2024, Machine Intelligence Research
Yifan Liu received her B.S. degree from Beihang University, China in 2016. She is currently a postgraduate at School of Automation Science and Electrical Engineering, Beihang University, China. Her research interests include image processing, neural language processing and deep learning.
Zengchang Qin obtained his M.Sc. in Computer Science and Ph.D. in Artificial Intelligence from the University of Bristol, UK, in 2002 and 2005, respectively. He worked as a lecturer in the same university before joining Lot Zadeh’s BISC group at the EECS Department of UC Berkeley as the BT postdoctoral fellow in 2006. He has been working in Beihang University as an associate professor in the School of Automation Science and Electrical Engineering from 2009. He was also a visiting scholar at Robotics Institute, Carnegie Mellon University, from November 2010 to June 2011. His research interests are uncertainty modeling, machine learning, multimedia retrieval and agent-based modeling.
Dr. Tao Wan is an Assistant Professor at Beihang University, Beijing. Dr. Wan was a research associate at the Case Western Reserve University and a postdoctoral associate at the Boston University School of Medicine. She received her Master degree in Global Computing and Multimedia from the Bristol University, UK in 2004 and her Ph.D. in Computer Science from the same university in 2009. She spent one year working as a senior researcher in the Samsung Advanced Institute of Technology (SAIT), China before becoming a visiting scholar in the Visualization and Image Analysis Lab in the Robotics Institute, Carnegie Mellon University. Her research interests are statistical models for image segmentation, fusion, and denoising, machine learning, medical image analysis, computer-aided diagnosis and prognosis system.
Zhenbo Luo (SRC-Beijing Machine Learning Lab). Zhenbo Luo received the Master degree in Electronics Engineering from Tsinghua University in 2006, and the supervisor is Professor Xiaoqing Ding. Received the B.Sc. degree in Electronics Engineering from Fudan University in 2003. He is a Principal Engineer and lab leader of Machine learning lab of Samsung R&D Institute China. His research interests include vision and learning. He leads the team to develop GAN, text recognition, AR, Human understanding technologies for Samsung smart phones, visual display and printing business.