1 Introduction

We introduce a database of 8,928 annotated cartoon faces designed as an aid in studying the problem of unconstrained cartoon understanding. The database can be downloaded from our project websiteFootnote 1.

Our database which we refer as IIIT cartoon faces in the wild (or IIIT-CFW in abbreviation) is designed to study spectrum of problems associated with cartoon understanding. Some of these problems are listed in Sect. 5. Oxford dictionary defines cartoon as follows: a simple drawing showing the features of its subjects in a humorously exaggerated way, especially a satirical one in a newspaper or magazine. However, the modern usage of cartoon has extended to any non-realistic, semi-realistic drawing or painting intended for satire, caricature, or humor, or to the artistic style of such works. Following the modern definition, our database of cartoon faces contains caricatures, paintings, cartoons and sketches of internationally well-known public figures.

Research in face recognition has been significantly advanced in last few years [10, 14, 17, 21, 24, 28]. Thanks to multiple databases introduced (we have discussed few of them in Sect. 2). These databases have triggered the research and study in face recognition domain. On the other hand, there have been only very few attempts to address the problem of recognizing cartoon faces [3, 22], that too with experiments on small test sets. Cartoon understanding has many application areas. One of the fundamental application is: given an image containing cartoons of people, answering questions, such as who are in the image and what are they doing? (See Fig. 1). Such understanding, (i) can help visually impaired people to understand cartoon images or movies, and (ii) can be used to automatically censor communal or politically incorrect cartoons in the social media. Other applications of understanding cartoons are generating realistic cartoon faces, generating various realistic facial expression in cartoons, etc. These applications have enormous importance in fine art.

Cartoon face recognition is the first step towards larger understanding of cartoon images. The problem of cartoon face recognition is closely related to real face recognition, however poses many additional challenges. For example, (i) artistic variations, (ii) limited examples, (iii) magnitude less data than real faces, (iv) highly caricatured. The other challenges such as pose, expression, age, illumination variations, race (e.g., cartoon faces of Asian characters are very different in look from that of European characters) and gender variations remain similar as face databases. We show some examples of our database illustrating the above mentioned challenges in Fig. 2. Another closely related area to cartoon face recognition is sketch recognition [2, 7]. However, in sketch recognition, most of the attempts were made to recognize handwritten geometric objects. Our database of cartoon faces is much broader and challenging than simple geometric sketches or drawing, and sketch recognition techniques may not be directly applicable to these images. We believe that the IIIT-CFW will prove a first important step towards conducting research in broad area of cartoon understanding.

Fig. 1.
figure 1

Who are these and what are they doing? There has been lots of advancement in face recognition. However, the problem of recognizing cartoon characters has not been looked rigorously. Our database: the IIIT-CFW will be useful for comprehensive studies of understanding cartoon images.

Fig. 2.
figure 2

Sample faces of the IIIT-CFW. It has a lot of variations, such as (a) age (young and old), (b) appearance (with beard, with glass), (c) artistic (pure cartoon, sketch and caricature) (d) expression (happy, sad, seducing, angry, etc.), (e) gender (male and female), (f) amount of caricaturing, (g) pose (frontal and non-frontal), and (h) race (Asian, American, African, etc.).

Before proceeding with the details, we briefly summarize the IIIT-CFW databse here:

  • This database contain cartoon images of 100 international celebrities (politician, actor, singer, sports person, etc.) and 8,928 images in all.

  • The images in this database are harvested from the web. These images contain cartoons drawn in totally unconstrained setting.

  • The IIIT-CFW contains detailed annotations and it can be used for wide spectrum of problems including cartoon face recognition, cartoon face verification, cartoon image retrieval, relative attributes in cartoons, gender or age group estimation from the cartoon faces, and cartoon faces synthesis.

  • We provide face bounding box along with some additional attributes such as age group, view, expression, pose etc.

  • Most of the images of this database are in color with very few in gray scale.

  • We provide standard train and test splits according to the problem.

More details are provided in the remainder of this paper which is organized as follows. In Sect. 2, we discuss some of the related databases. Since cartoon understanding is novel problem, however, closely related to face recognition, we provide brief discussion on popular face recognition databases, and compare them with ours. Section 3 describes image collection and annotation scheme. We give detail database statistics and analyze the complexity of the database in Sect. 4. We then discuss intended usage of this database where we discuss about spectrum of problems which can be studied on this database. We finally provide concluding remarks for this database paper in the Sect. 6.

2 Related Databases

The problem of cartoon face understanding lacks standard benchmark databases. On the other hand, the real face recognition community has a clear advantage of having plethora of standard benchmarks. Starting from popular ORL database [16] in the year 1994 and Yale database [1] in the year 1997, numerous standard benchmarks have been released to address face recognition problem. Each of these databases are more challenging and realistic as compared to their past counterparts. In this section we present a brief survey of some of the popular face databases and compare them with our cartoon face database. We summarize this comparison in Table 1.

Table 1. Variations, size and target applications of various face databases vs our cartoon database. Our database provides suitable annotations and train-test splits to study wide range of problems (R: race, G: gender, P: pose, E: expression, A: appearance, \(P_1\): face recognition, \(P_2\): face verification, \(P_3\): face detection, \(P_4\): photo2cartoon \(P_5\): cartoon2photo \(P_6\): attributes based search \(P_7\):relative attributes in faces, *: Not available publicly).
  • ORL face database [16]. This database consists of 400 face images of 40 distinct characters. The images for some characters were taken at different time, with varying lighting and facial conditions. The images are face centered and taken against a dark background.

  • Yale face database [1]. This database contains 165 face images of 15 characters. The faces in this database are frontal and have very small variations.

  • Labeled Wikipedia faces (LWF) [8]. The LWF database contains 8.5 K faces of approximately 1.5 K individuals. These images were collected from the Wikipedia living people category. The database, in addition to images, contains some meta data, e.g., the source images, image captions (if available) and person name detection results.

  • FaceScrub [13]. It contains 1,06,863 images of 530 celebrities, with about 200 faces per person. The images in the database were harvested from the web and are taken under real-world situations (i.e., uncontrolled settings). Images apart from the queried person were discarded in order to build the database.

  • FEI face database [23]. Is a Brazilian face database that contains a set of 2800 face images, 14 images for each of 200 individuals. Equal number of male and female subjects are present in the database. Colored images taken against a white homogeneous background in an upright frontal position with profile rotation of up to about 180 degrees are present in the database.

  • Indian face database [11]. Is a database consisting eleven different images, each of 40 distinct subjects. The images were acquired in an upright frontal position against a bright homogeneous background. The database also provide detail annotations for face orientation. Some emotions such as neutral, smile, laughter, sad or disgust are also included in annotation of this database.

  • Labeled faces in the wild [9]. The LFW database of face images is designed for studying the problem of unconstrained face recognition. The database contains more than 13,000 images of 1680 characters. These images were collected from the web. The faces in the database are result of Viola-Jones face detector [25] and hence, are roughly frontal.

  • CelebFaces [19]. This database contains 5436 characters and 87628 face images in all. It contains faces of those celebrity who do not exist in LFW. Images in this database were collected from web.

  • Youtube face [27]. The database contains 3,425 videos of 1,595 different people. All the videos were downloaded from YouTube.

  • PubFig83 [15]. The PubFig83 database contains 8300 cropped facial images of 83 unique public figures. This database aims to provide the solution for the problem of recognizing identities from near-frontal faces.

  • Indian Movie Face Database (IMFDB) [18]. Is a large unconstrained face database consisting of 34512 images of 100 Indian actors from more than 100 videos. The images have been obtained manually by cropping them from video frames in turn leading to high degree of variability in terms of scale, pose, expression, illumination, age, resolution, occlusion, and makeup. The IMFDB provides a detailed annotation of every face in terms of age, pose, gender, expression and type of occlusion. Standard train-val-test division is missing in this database.

  • Industrial benchmarks [17, 21]. These face databases introduced by Google and Facebook are the largest in size. However, they are not publicly available. The Google and Facebook database contains 200 Million and 4.4 Million faces of 4030 and 8 Million characters respectively.

  • VGG face database [14]. This database is introduced by Visual geometry group of Oxford university. It contains 2.6 Million faces of 2363 characters harvested from the web.

Contrary to the real face databases, ours is a first attempt to create a realistic and large database of cartoon faces. In many ways our database is similar to pubFig and celebFace, e.g., both these databases and ours contain faces of public figures harvested from the web. However, our database has additional challenges due to the fact that it contains cartoon faces. Moreover, unlike pubFig which can only be used for face verification, our database can be used to address many different problems (see Sect. 5). In short, the major highlights of our database are as follows: (i) it is a first large database of cartoon faces, (ii) it contains detail annotations, and hence, it can be used to study spectrum of problems associated with cartoon understanding.

3 Data Collection and Annotations

The CFW is harvested from Google image search. We have used queries like ‘Obama + cartoon’, ‘Emma Watson + cartoon’, and so on, to collect cartoon images of 100 public figures. These public figures are chosen from different categories such as sports, politics, art, science, etc. and different countries such as USA, UK, Australia, India, etc. Once images are collected, we use manual filtering to remove irrelevant images. We then gave these images to our annotation team of three people who were asked to draw face bounding boxes and provide following attributes to each face:

  • Type of cartoon: cartoon, cartoon sketch, caricature

  • Pose: frontal, non frontal

  • Expression: happy, sad, thoughtful, seductive, sorrow, angry, serious, frightened, crying, shocked

  • Age group: young, old

  • Gender: male, female

  • Beard: yes, no

  • Glass: yes, no

For the subjective attributes like age group and expression, we used voting scheme among the annotators. In addition to face annotation, attributes for each face, some metadata details such as the name of the personality and the identity number (id number) is also assigned.

Annotation is stored and described in the form of XML. The XML contains the information about the unique id number assigned to each personality in addition to the name of the personality and the above seven attributes. Moreover, the coordinates \((x_1,y_1,width,height)\) of the face bounding box are also included in the XML description. Figure 3 shows an example of annotations we provide in the CFW. For annotation, image markup tool which was released by Mozilla under public license version–1.1 was usedFootnote 2.

Fig. 3.
figure 3

Example of annotations provided in the IIIT-CFW.

4 Statistical Details and Complexity Analysis of Database

We now provide statistical details of the IIIT-CFW which is collected in totally unconstrained manner. The IIIT-CFW has large variations in attributes. These variations make the database realistic and challenging. This database contains 8928 cartoon images of 100 public figures. The variations in this database across the attributes are shown through pie-charts in Fig. 4. In this figure, we illustrate variations in data across seven attributes (discussed in Sect. 3) and variations in race. We observe that the introduced database posses large diversity.

Fig. 4.
figure 4

Wide variations in attributes in our database. We show distribution of face images across eight attributes and observe that our database has lots of diversity across various attributes.

Fig. 5.
figure 5

Mean faces of Angelina Jolie (1st and 3rd column) and Barack Obama (2nd and 4th column) in pubFig [15] and ours. We observe that in pubFig [15] celebrities are clearly recognizable but not in ours.

We also analyze the complexity of the database by analyzing the mean face and eigenfaces of the database and compare it with some of the popular face database, namely, Yale [1], LFW [9] and pubFig [15]. These comparisons are shown in Figs. 6 and 7 respectively. Here, we show mean and eigenfaces of Yale database [1] and relatively harder databases such as LFW [9] and pubFig [15]. We observe that as compared to ours, the mean and the top eigenfaces of these databases clearly look like faces. This implies that these popular database have lesser variations in pose and appearance of faces as compared to ours. We also compare celebrity wise mean faces of our database with that of pubFig [15] which also contain faces of public figures. This comparison is shown in Fig. 5. We choose faces of Angelina Jolie and Barack Obama for this study. As can be seen, one can easily recognize the mean faces of these two celebrities in case of pubFig [15]. However, the mean face of these celebrities in our cartoon database is hard to recognize. These observation justify our claim that our database contains cartoon images collected in a totally unconstrained setting and it has a large variations.

Fig. 6.
figure 6

Mean faces of our cartoon database in (d) as compared to (a) Yale [1] (b) LFW [9] and (c) pubFig [15]. We observe that mean face of Yale database clearly looks like a face as compared to the relatively harder databases such as LFW and pubFig. On the other hand, our database: the IIIT-CFW contains large variations, and hence the mean face does not clearly look like a face.

Fig. 7.
figure 7

Top-5 eigenfaces of (a) Yale [1] (ii) LFW [9], (c) pubFig [15] and (d) Ours

Fig. 8.
figure 8

Spectrum of problems which can be studied on our database: (a) cartoon face recognition, (b) cartoon gender classification (c) cartoon face verification, (d) photo to cartoon search (e) cartoon to photo search, (f) cartoon face detection (e) attribute based search in cartoons, and (f) relative attributes in cartoons. We provide suitable annotations and problem specific train-val-test split to conduct study on various problems associated with cartoon faces.

Table 2. Training, testing and validation splits in our database (the CFG) for various problems (CF: cartoon faces, RF: real faces, CFP: cartoon face pairs).

5 Spectrum of Problems

The IIIT-CFW has been introduced to study a spectrum of problems. These problems are summarized in Fig. 8. Although many of these problems are fundamental in real face domain, but due to the unavailability of detail annotations and problem specific train-test splits existing face databases facilitate research only on selected problems. On the other hand, we not only provide problem specific train-val-test splits, but also detail annotations to study wide range of problems associated with cartoon faces. In this section, we briefly describe the problems which can be studied on our database, and explain the train-val-test split strategy. Table 2 summarizes the training, validation and test sets in our database.

  1. 1.

    Cartoon face recognition. The problem of cartoon face recognition is to recognize the given cartoon face as one of the C classes.

  2. 2.

    Cartoon face verification. Given two cartoon faces the problem of face verification is to answer weather the pair is of same person or not. For this problem we have labeled face pairs as same or different based on if they are of same public figure or not. The train-test division is done such that if a pair of cartoon characters c belong to train set, then it does not belong to validation and test set.

  3. 3.

    Cartoon gender identification. It is binary classification problem where given a cartoon face the system has to answer weather it is a male or a female. We provide a standard train and test split by making sure that if a cartoon character is present in train set, examples of this cartoon character will not be present in test set.

  4. 4.

    Photo2cartoon and cartoon2photo. The problem here is: given real face of a public figure, can we retrieve all the cartoon faces of that public figure from a large database of cartoons, and vice-versa. This database provide support for studying such problems. Along with cartoon faces we provide 1000 real faces (ten real face of each character). These real faces are harvested from Google image search and can be used as query for photo2cartoon retrieval and database for cartoon2photo retrieval. Further for cartoon2photo retrieval all the cartoon in the database should be used as query.

  5. 5.

    Face detection, pose estimation and landmark detection. This has been a well studied problem of real face images [29]. On other side, cartoon faces have lots of artistic variations and detecting face, landmark points such as eyes, nose, mouth etc. can prove challenging for the state of the art methods. For example: results of face detection using the famous Viola-Jones method [25] is shown in Fig. 9. We see that face detection in cartoon is not a trivial task and needs special attention. Our database can also facilitate research for this problem.

  6. 6.

    Relative attributes in Cartoon. Relative attributes in faces has been a well studied problem. Given a pair of cartoon faces (AB) the problem is to answer some questions such as: Is A older than B? Is A happier than B? Are A and B of similar age group? We also provide annotation for this problem and a standard train-val-test split similar to face verification problem.

  7. 7.

    Attribute based cartoon search. The IIIT-CFW database also provides annotation for attribute based cartoon search, such as search all cartoon faces with glass, search all female cartoon faces who are happy, and so on. We provide 10 such string queries along with associated relevant images.

Fig. 9.
figure 9

Results of Viola-Jones [25] face detection on our database: first four images show some successful face detection. The last four images are few examples where Viola-Jones fails to detect faces. In general, we observe that seminal method Viola-Jones is not very successful in detecting cartoon faces.

Other exciting problems. Generating photo-realistic images of human faces has been a fascinating yet a hard problem in computer vision and computer graphics. This problem has gained attention of researchers in the last decade, and some interesting works have been published in this space, such as, face hallucination [12], facial expression generation [20]. There have also been some works on face sketches, which are the subset of our database, e.g., face sketch synthesis and recognition [26], sketch inversion [4].

These problems in real faces or sketch faces domain can be also studied for cartoons, and our database can be used for conducting such study. Generating realistic cartoon faces with various facial expressions can be an exciting application in the fine art perspective.

6 Conclusions

We have introduced a large database of cartoon faces and discussed the spectrum of problems which can be studied on this database. We believe the database will trigger research in cartoon face understanding and can prove a turning point in the way of how computers see the cartoons.