Elsevier

Neurocomputing

Volume 142, 22 October 2014, Pages 256-266
Neurocomputing

Detection of user-registered dog faces

https://doi.org/10.1016/j.neucom.2014.03.058Get rights and content

Highlights

  • We propose a novel framework to adaptively detect user-registered dog faces.

  • Our detector combines both user-registered samples and off-line trained models.

  • We design a strategy-selection method to adaptively decide when and how to combine.

Abstract

Dog face detection is an important object detection task, widely applied in many fields such as auto-focus and image retrieval. In many applications, users only care about specific target species, which are unknown to a detection system until the users register some relevant information like a limited number of target samples. We call this scenario the detection of user-registered dog faces. Due to the great variation between different dog species, no single model can describe all the species well. Meanwhile, it is also impractical to learn individual models for every potential target species that the users may care about, given the large number of dog species. Furthermore, the registered samples are usually too few to train a robust detector directly. In this context, we propose a novel user-registered object detection framework. This framework can generate an adaptive detector, from only a limited number of user-registered target samples and a couple of off-line trained auxiliary models. In addition, we build an annotated dog face dataset, which contains 10,712 images of 32 species. Experimental results on the dataset demonstrate that the proposed framework can achieve superior detection performance to the state-of-the-art approaches.

Introduction

Animal detection is a hot topic in the field of object detection. It has been widely applied in auto-focus, image retrieval, multimedia contents analysis, etc. [1], [2], [3], [4]. In animal detection, dog face detection is an interesting and especially challenging task.

One of the greatest challenges to dog face detection is that there are so many dog species. The appearance of dog faces has great diversity between different species, from long-nosed dogs to short-nosed dogs, from shaggy dogs to smooth-haired dogs, etc. Hence a single model will surely not be robust enough to the variety of dog breed.

Meanwhile, it is impractical in many application scenarios to train individual models for every dog species, given the large number of species. There will also be huge costs for the sample collection, model training and computational storage. Therefore, a practically reasonable approach is to create and model a few superordinate groups which represent groups of similar species, and then offer the corresponding superordinate models to users.

On the other hand, in many application scenarios, a user only wants to detect a specific dog species that they care about. For example, a user comes across some pictures of a specific species and she wants to find from some large datasets more images that contain the specific species. Sometimes the user even cannot tell the species name and what she can provide to the system is only to register a small number of samples of the target species. In this case, unlike traditional object detection tasks, the main issue here is that the target species are unknown to the system until the users register a few target samples.

For these unknown species, the information extracted from the registered samples is the most relevant, but the number of the provided samples can be too limited and insufficient to directly train a robust detector.

Therefore, we propose a new framework for the detection of user-registered dog faces. The framework can generate an adaptive detector from a limited number of user-registered target samples and a couple of off-line trained auxiliary superordinate models. The framework is illustrated in Fig. 1.

Our paper makes the following three main contributions. (1) We propose an novel framework to detect user-registered dog faces. The framework can generate a detector adaptive to users׳ demand. (2) The framework can combine the knowledge from both the off-line trained auxiliary models and the registered samples of the target species for the detection. (3) We design a strategy-selection algorithm to automatically determine when and how to appropriately utilize the auxiliary models and the registered target samples.

In addition, we built an annotated dataset of near-frontal dog faces, which contains 10,712 images of 32 species. The dog images were collected from a web site where dog owners can upload the species information and images of their dogs. Experimental results on the dataset demonstrate that the proposed framework is superior to the state-of-the-art methods. Although our research focuses on the detection of user-registered dog faces, the proposed framework can be extended readily to other tasks of animal detection or more generic object detection.

Section snippets

Related work

In our proposed framework, we exploit both off-line trained auxiliary models and user-registered samples to generate an adaptive detector of dog faces. Hence, the works most relevant to ours are the off-line training methods and the adaptive model learning methods for object detection.

Most of the recent work on animal detection [1], [2], [3], [4] focuses on the off-line training. Kozakaya et al. [1] cascade a coarse model trained with AdaBoost [5] and a fine model trained with a linear SVM

Detection of user-registered dog faces

The flow diagram of the proposed framework is illustrated in Fig. 2. The framework comprises three main modules: training, registration and detection.

First, in the training stage, the pre-annotated images are used to train several detectors for the superordinate species. The learnt detectors make up a pool of auxiliary models. Then, in the registration stage, a new detector will be generated for the user-registered samples, with the help of some auxiliary models. A strategy-selection algorithm

Experimental settings

We evaluate the proposed user-registered detection framework on a dataset of dog faces. The dataset contains 10,712 images of near-frontal dog faces from 32 species. Each dog face is annotated with a tight bounding box around the face and three points, two for the eyes and one for the nose. The positive training samples of dog faces are normalized so that the distance between the two eyes is 48 pixels. The negative samples come from the dog-free images in the PASCAL VOC 2007 dataset [15]. For

Conclusion

In this paper, we have proposed a user-registered object detection framework. The framework can offer users customizable detectors after the users register a small number of samples of the target object that they are interested in. The framework can effectively leverage the off-line trained auxiliary models and the user-registered samples, through using a strategy-selection algorithm. The experiments on the dataset of dog faces demonstrated that this method can lead to a detector superior to

Acknowledgments

We are grateful to the reviewers for their constructive comments and suggestions. The work was partially sponsored by National Natural Science Foundation of China (No. 61271390).

Zhiwei Ruan received the B.S. degree in Information and Electronics Engineering from the Department of Electronic Engineering, Tsinghua University, China, in 2009. He is currently a Ph.D. candidate in the Department of Electronic Engineering, Tsinghua University. His research interests are in the area of object detection and tracking and intelligent surveillance.

References (17)

  • Z. Qi et al.

    Online multiple instance boosting for object detection

    Neurocomputing

    (2011)
  • T. Kozakaya, S. Ito, S. Kubota, O. Yamaguchi, Cat face detection with two heterogeneous features, in: ICIP, IEEE, 2009,...
  • O.M. Parkhi, A. Vedaldi, C. Jawahar, A. Zisserman, The truth about cats and dogs, in: ICCV, IEEE, 2011, pp....
  • W. Zhang et al.

    From tiger to pandaanimal head detection

    IEEE Trans. Image Process.

    (2011)
  • H. Azizpour, I. Laptev, Object detection using strongly-supervised deformable part models, in: ECCV, Springer,...
  • P. Viola et al.

    Robust real-time face detection

    Int. J. Comput. Vis.

    (2004)
  • Y. Aytar, A. Zisserman, Enhancing exemplar SVMs using part level transfer regularization, in: BMVC, 2012, pp....
  • P.F. Felzenszwalb et al.

    Object detection with discriminatively trained part-based models

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2010)
There are more references available in the full text version of this article.

Cited by (0)

Zhiwei Ruan received the B.S. degree in Information and Electronics Engineering from the Department of Electronic Engineering, Tsinghua University, China, in 2009. He is currently a Ph.D. candidate in the Department of Electronic Engineering, Tsinghua University. His research interests are in the area of object detection and tracking and intelligent surveillance.

Guijin Wang received the B.S. and Ph.D. degree in signal and information processing (with honors) from the Department of Electronics Engineering, Tsinghua University, China, in 1998 and 2003, respectively. From 2003 to 2006, he was with Sony Information Technologies Laboratories as a researcher. Since 2006, he has been with the Department of Electronics Engineering at, Tsinghua University, China, as an associate professor. He has published over 50 International journal and conference papers and holds several patents. He is the session chair of IEEE CCNC׳06. His research interests are focused on wireless multimedia, image and video processing, depth imaging, pose recognition, intelligent surveillance, industry inspection, object detection and tracking and online learning.

Jing-Hao Xue received the B.Eng. degree in telecommunication and information systems in 1993 and the Dr.Eng. degree in signal and information processing in 1998, both from Tsinghua University, the M.Sc. degree in medical imaging and the M.Sc. degree in statistics, both from Katholieke Universiteit Leuven, in 2004, and the Ph.D. degree in statistics from the University of Glasgow, in 2008. He has worked in the Department of Statistical Science at University College London as a Lecturer since 2008. His research interests include statistical and machine-learning techniques for pattern recognition, data mining and image processing, in particular supervised, unsupervised and incompletely supervised learning for complex and high-dimensional data.

Xinggang Lin received his B.S. in electronics engineering, Tsinghua University, China, in 1970; an M.S. in 1986 and a Ph.D. in 1982, both in information science from Kyoto University, Japan. He joined the Department of Electronics Engineering at Tsinghua University in 1986 where he has been a full professor since 1990. He received a “Great Contribution Award” from the Ministry of Science and Technology of China, and “Promotion Awards of Science and Technology” from Beijing Municipality. He was a General co-chair of the second IEEE Pacific-Rim Conference on Multimedia, an associate editor of IEEE T. on CSVT, and a technical/organizing committee member of many international conferences. He is a fellow of the China Institute of Communications, and has published over 140 referred conference and journal papers in diversified research fields.

Yong Jiang received his B.S. in electronics engineering, Zhengzhou University, China, in 2001; and received his M.S. and Ph.D. in Computer vision and image processing, Nanjing University of Aeronautics and Astronautics, China, in 2004 and 2007. From 2006 to the present, he was working in Canon information Technology (Beijing) Co., LTD as an intern, researcher, senior researcher and project manager, and applied more than 10 patents in US, Japan, and China as the first inventor. His research interests are focused on image and video processing, intelligent surveillance, industry inspection, object detection and tracking and online learning.

View full text