Elsevier

Neurocomputing

Volume 311, 15 October 2018, Pages 78-87
Neurocomputing

Auto-painter: Cartoon image generation from sketch by using conditional Wasserstein generative adversarial networks

https://doi.org/10.1016/j.neucom.2018.05.045Get rights and content

Abstract

Recently, realistic image generation using deep neural networks has become a hot topic in machine learning and computer vision. Such an image can be generated at pixel level by learning from a large collection of images. Learning to generate colorful cartoon images from black-and-white sketches is not only an interesting research problem, but also a useful application in digital entertainment. In this paper, we investigate the sketch-to-image synthesis problem by using conditional generative adversarial networks (cGAN). We propose a model called auto-painter which can automatically generate compatible colors given a sketch. Wasserstein distance is used in training cGAN to overcome model collapse and enable the model converged much better. The new model is not only capable of painting hand-draw sketch with compatible colors, but also allowing users to indicate preferred colors. Experimental results on different sketch datasets show that the auto-painter performs better than other existing image-to-image methods.

Introduction

Human beings possess a great cognitive capability of comprehend black-and-white cartoon sketches. Our mind can create realistic colorful images from abstract black-and-white cartoons. However, one may need a great artistic talent to choose appropriate colors, with proper changes in light and shade for creating visually compatible cartoon images - it is not easy job for untrained people like us. If we can automatically paint the sketches into colorful cartoon images, it could be a very useful application to help artists in cartoon industry or other digital entertainment industries. In this work, we are working on solving this problem by employing deep neural networks to transfer line draft black-and-white sketches into a specific cartoon style. Practically, the new model can make up ordinary people’s artistic talent and even inspire artists to create cartoons with different styles. Ideally, one can has the freedom to generate different styles of cartoons based on his or her own tastes of colors.

Cartoon image generation from sketches can be regarded as an image synthesis problem. Previously, many non-parametric models were proposed [1], [2], [3] by matching the sketch to a database of existing image fragments. Recently, numerous image synthesis methods based on deep neural networks have emerged [4], [5], [6], [7]. These methods can generate images with details such as faces, bedrooms, chairs and handwritten numbers. As the photo-realistic images are full of sharp details, the results may suffer from being blurry [8], noisy [6] and objects being wobbly [9]. What’s more, the outputs of the network are hard to be controlled because the generator samples from a random vector in low dimension and the model has too much flexibility. Several recent approaches have explored the applicability of controllable method for image synthesis in different applications, for instance, the super resolution problem [10], [11], [12], [13], semantic object labeling [14], image manipulation [15], image de-raining [16], grayscale image colorization [17] and other image-to-image transformation [18], [19], and achieved convincing results. Especially for sketch-to-image problem [20], control signals are relatively sparse, more ill-posed than colorization problem based on grayscale, it requires a model to synthesize image details beyond what is contained in the input. The network should learn low-level texture information as well as high-level image styles. The style of cartoon can be reflected from artistic color collocations (e.g. one style could be green hair with purple eyes, the other style may be black hair with black eyes) that may require more constraints in modeling. We investigate how to use Generative Adversarial Networks (GANs) in conditional setting for image generation. Constraints including total variance loss, pixel loss and feature loss are used in training the generator in order to generate more color collocations of different styles. The TV regularization is first used by [21] to encourage spatial smoothness. We also introduce the color control that allows users paint with their favorite colors. Fig. 1 shows an example of generated cartoon images from a sketch, the results of auto-painter (with and without color control) are compared to the ground truth in the middle.

The main contributions of this work can be summarized as follows.

  • We propose a learning model called auto-painter that can automatically generate vivid and high resolute painted cartoon images from a sketch by using conditional Generative Adversarial Networks (cGANs). In the proposed model, we combine traditional loss and adversarial loss to generate more compatible colors.

  • The Wasserstein distance loss is used in our cGAN based model, and empirical results show that the Wasserstein GANs [22] can stable the cGANs training and obtain better results comparing to other models.

  • Our work was one of earliest works of using GANs for cartoon generation and the initial results were put online at arXiv.1 We also designed a demo2 with user interface for testing and all code are available to public for test.3

Section snippets

Generative adversarial networks

Generative adversarial networks (GANs) are recently regarded as a breakthrough in machine learning [6], [7], it consists of two ‘adversarial’ modules: a generative module G that captures the data distribution, and a discriminative module D that estimates the probability that a sample comes from the training data rather than G. Both G and D could be deep neural networks. In image synthesis with GANs, the generator attempts to produce a realistic image from an input random vector to fool the

Auto-painter

Auto-painter is a supervised learning model based on conditional GANs, given a black-and-white sketch, the model can generate a painted colorful image based on given sketch-image pairs in the training data. We employ a feed-forward deep neural network as generator to get a quick response in test. The generator takes the sketch as input and outputs a colorful cartoon image of the same resolution at pixel level.

Datasets

In order to train the auto-painter model, we collect a large number of cartoon pictures from the Internet with a crawler. Most previous research (e.g. [17], [20]) studies images of low resolution. As most cartoons are in high resolution in the real-world, the resolution of our training images are all at 512 × 512. In order not to change the ratio of the cartoons, we first random scale the resolution to make the shortest side longer than 512. Then, we crop the picture from bottom, middle and the

Interactive colorization

Given an input image of resolution 512 × 512, the auto-painter can generate a painted colorful output image within 1 s, which enables instant feedback to design an interactive image editing tool. We propose two approaches to allow user interaction with auto-painter.

(1) Sketch Modification: auto-painter trained on the Minions data provides a tool for users to design virtual images of ‘minions style’. As shown in the left-hand side of Fig. 9, based on the given initial sketch, one can modify the

Conclusions

In this paper, we proposed the auto-painter model to solve the sketch-to-image problem. Our approach was based on conditional GAN with Wasserstein distance. The ‘U-net’ structure is used to allow the output image has both low level sketch information as well as learned high-level color information. We put more constraints based on the pix2pix model to obtain better painting performance. We also trained the auto-painter to adapt to color control, so that the network can adapt the synthesis

Acknowledgment

We are grateful to our colleagues Naihan Li, Wei Zhao in Samsung for insightful discussions. We also thank Yuzeng Kou from Beihang University for building the online demo. Tao Wan is partly funded by the National Science Foundation of China under award No. 61401012.

Yifan Liu received her B.S. degree from Beihang University, China in 2016. She is currently a postgraduate at School of Automation Science and Electrical Engineering, Beihang University, China. Her research interests include image processing, neural language processing and deep learning.

References (39)

  • D. Pathak et al.

    Context encoders: feature learning by inpainting

    Proceedings of the CVPR

    (2016)
  • J. Hays et al.

    Scene completion using millions of photographs

    Commun. ACM

    (2007)
  • C. Barnes et al.

    Patchmatch:a randomized correspondence algorithm for structural image editing

    ACM Trans. Gr.

    (2009)
  • FreemanW.T. et al.

    Example-based super-resolution

    IEEE Comput. Gr. Appl.

    (2002)
  • R. Salakhutdinov et al.

    Deep Boltzmann machines

    J. Mach. Learn. Res.

    (2009)
  • LeeH. et al.

    Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations

    Proceedings of the ICML

    (2009)
  • I.J. Goodfellow et al.

    Generative adversarial nets

    Adv. Neural Inf. Process. Syst.

    (2014)
  • A. Radford et al.

    Unsupervised representation learning with deep convolutional generative adversarial networks

    CoRR

    (2015)
  • D.P. Kingma et al.

    Auto-encoding variational Bayes

    Statistics

    (2014)
  • E.L. Denton et al.

    Deep generative image models using a Laplacian pyramid of adversarial networks

    Proceedings of the NIPS

    (2015)
  • J. Johnson et al.

    Perceptual losses for real-time style transfer and super-resolution

    Proceedings of the ECCV

    (2016)
  • DongC. et al.

    Learning a deep convolutional network for image super-resolution

    Proceedings of the ECCV

    (2014)
  • C. Ledig et al.

    Photo-realistic single image super-resolution using a generative adversarial network

    Proceedings of the CVPR

    (2017)
  • LinG. et al.

    nImage super-resolution using a dilated convolutional neural network

    Neurocomputing

    (2017)
  • DongH. et al.

    Unsupervised image-to-image translation with generative adversarial networks

    CoRR

    (2017)
  • WangC. et al.

    Tag disentangled generative adversarial network for object image re-rendering

    Proceedings of the IJCAI

    (2017)
  • ZhangH. et al.

    Image de-raining using a conditional generative adversarial network

    CoRR

    (2017)
  • ZhangR. et al.

    Colorful image colorization

    Proceedings of the ECCV

    (2016)
  • P. Isola et al.

    Image-to-image translation with conditional adversarial networks

    Proceedings of the CVPR

    (2017)
  • Cited by (127)

    • Using Wasserstein Generative Adversarial Networks for the design of Monte Carlo simulations

      2024, Journal of Econometrics
      Citation Excerpt :

      More importantly, we wish to generate potential treated and control outcomes given a common set of pre-treatment variables. For that reason it is important to generate data from a conditional distribution (Mirza and Osindero, 2014; Odena et al., 2017; Liu et al., 2018; Kocaoglu et al., 2017). In this section we discuss the application of WGANs for Monte Carlo studies based on the Lalonde–Dehejia–Wahba (LDW) data.

    • Video colorization with semantic preservation and consistency

      2023, Computers and Electrical Engineering
    • Self-Driven Dual-Path Learning for Reference-Based Line Art Colorization under Limited Data

      2024, IEEE Transactions on Circuits and Systems for Video Technology
    View all citing articles on Scopus

    Yifan Liu received her B.S. degree from Beihang University, China in 2016. She is currently a postgraduate at School of Automation Science and Electrical Engineering, Beihang University, China. Her research interests include image processing, neural language processing and deep learning.

    Zengchang Qin obtained his M.Sc. in Computer Science and Ph.D. in Artificial Intelligence from the University of Bristol, UK, in 2002 and 2005, respectively. He worked as a lecturer in the same university before joining Lot Zadeh’s BISC group at the EECS Department of UC Berkeley as the BT postdoctoral fellow in 2006. He has been working in Beihang University as an associate professor in the School of Automation Science and Electrical Engineering from 2009. He was also a visiting scholar at Robotics Institute, Carnegie Mellon University, from November 2010 to June 2011. His research interests are uncertainty modeling, machine learning, multimedia retrieval and agent-based modeling.

    Dr. Tao Wan is an Assistant Professor at Beihang University, Beijing. Dr. Wan was a research associate at the Case Western Reserve University and a postdoctoral associate at the Boston University School of Medicine. She received her Master degree in Global Computing and Multimedia from the Bristol University, UK in 2004 and her Ph.D. in Computer Science from the same university in 2009. She spent one year working as a senior researcher in the Samsung Advanced Institute of Technology (SAIT), China before becoming a visiting scholar in the Visualization and Image Analysis Lab in the Robotics Institute, Carnegie Mellon University. Her research interests are statistical models for image segmentation, fusion, and denoising, machine learning, medical image analysis, computer-aided diagnosis and prognosis system.

    Zhenbo Luo (SRC-Beijing Machine Learning Lab). Zhenbo Luo received the Master degree in Electronics Engineering from Tsinghua University in 2006, and the supervisor is Professor Xiaoqing Ding. Received the B.Sc. degree in Electronics Engineering from Fudan University in 2003. He is a Principal Engineer and lab leader of Machine learning lab of Samsung R&D Institute China. His research interests include vision and learning. He leads the team to develop GAN, text recognition, AR, Human understanding technologies for Samsung smart phones, visual display and printing business.

    View full text