Elsevier

Signal Processing

Volume 93, Issue 8, August 2013, Pages 2205-2211
Signal Processing

Energy-saving object detection by efficiently rejecting a set of neighboring sub-images

https://doi.org/10.1016/j.sigpro.2012.08.019Get rights and content

Abstract

Object detection is crucial for multimedia processing such as image understanding and video analysis. Due to the large amount of images and videos and the limited computational resource, effective and efficient object detection is challenging. Although much effort has been done to develop fast object detection algorithms, little work was concentrated on energy-saving algorithms. Even two detection algorithms that detect objects in the same speed may consume different electronic energies. In this paper, we focus on developing an energy-saving object detection algorithm which can reject a bundle of neighboring sub-images with only one inner-product operation and an acceptable number of addition ones. The total number of multiplications and additions of our algorithm is almost equal to that of the traditional sliding-window method. But in our algorithm the number of multiplication operations is much smaller. Because addition operation consumes less energy than multiplication operation, the proposed method is more energy-saving and efficient. Experimental results on hand detection show that our approach leads to significant improvements in energy and efficiency.

Highlights

► An energy-saving object detection algorithm is proposed. ► A set of neighboring sub-images can be rejected with as small as one inner-product operation. ► Sub-images that are neighboring in spatial domain are neighboring in feature space in many cases.

Introduction

More and more intelligent devices rely on effective and efficient object detection [1], [2]. Face recognition systems for access control place high demand on face detection. Visual surveillance such as gait recognition needs robust human detection [1]. Recent progress in vision-based human–computer interfaces (HCI) also benefits from the advances in object detection. There is a trend that many embed devices such as intelligent mobile phones require advanced object detection technology to provide friendly and comfortable HCI [3]. But the computational resource of mobile phone is limited and its battery can provide merely very limited energy. Therefore, object detection with less battery energy is important for users and our society. Even for AC-power-supply devices such as personal computers, it is desirable if the object detection algorithms can use less electronic energy. Energy-saving algorithms can contribute for environment projection and economic cost reduction. In this paper, we propose an energy-saving object detection algorithm.

In the following, we briefly review several representative object detection algorithms and then discuss several fast algorithms. Finally, we point out that there is a space for improving the sliding-window based algorithms so that they can utilize less energy. Note that in our paper, object detection is defined as a process of checking whether there are predefined objects in an image and output the position, scale, orientation or other relevant information of the objects if the objects appear in the image. Moving object detection that utilizes techniques such as background modeling and frame difference is beyond the scope of the paper.

In this paper, according to the image feature types, we divide object detection algorithms into two categories: visual-word based methods and sliding-window based methods. Visual-word based object detection algorithms represent an image in a way similar to textual document. Specifically, an image is represented by a subset of an established visual vocabulary with each visual word being a quantized image feature (representative features are SIFT [4], SURF [5], ASFIT [6], Fair-SURF [7], Brief [8], and ORB [9]). The visual vocabulary can be formed by clustering features of a large number of training images with the cluster centers being considered as visual words. With the visual words, object detection can be performed by employing three different strategies: (1) generalized Hough transform (implicit shape model is specific form of generalized Hough transform) [10], [11], [12], (2) generative models, and (3) branch and bound framework [13], [14], [15], [16]. In the strategy of generalized Hough transform, each detected visual words vote in the Hough space and the one with the largest (or the first largest several) voting score (scores) corresponds to the parameters (i.e. position, scale and/or orientation) of the objects. In the generative-model based methods, object detection is formulated as a problem of maximum a posteriori probability. In the branch and bound strategy, a classifier (e.g. support vector machine) is firstly trained and then used as a quality function. A bound function is constructed over a set of rectangles. A split-push-retrieve loop is conducted until convergence [13], [14], [15], [16].

However, to the best of our knowledge, visual-word image representation is not suitable for applications such as robust face detection, hand detection, and human detection. For example, for textureless object (such as fist), it is difficult to extract stable features to form visual words, which leads the visual-word based method invalid. Therefore, the traditional sliding-window based methods still dominate the object detection community. In the sliding-window based method, a window with a specific scale slides over the whole image with a small step. Then it extracts image features from the sub-image covered by the sliding window at each location. The image features (usually expressed a feature vector) are classified by a trained classifier. Existing methods differ in image features and classifiers. No single type of features and classifier is optimal for all applications. Haar-like rectangle features achieve great success in face detection with the help of feature selection and combination by the Adaboost algorithm [17]. Histograms of oriented gradients (HOGs) [18] are powerful for the case where the object contour is stable relative to the inner appearance. The most successful application of HOG is human detection [1], [19]. Local binary patterns (LBP) and its variants can effectively describe texture-rich objects and are also widely used in object detection and recognition. The most popular classifier is support vector machine (SVM) [20] which has good generalization ability due to its maximum-margin property. Note that template-matching based object detection belongs to the sliding window method. But it is not involved in classifier learning.

The major drawback of sliding-window based methods is that it requires exhaustively scanning the whole image which makes it computationally intensive and time-consuming. Two kinds of coarse-to-fine algorithms have been developed to reduce the computation cost: simple-to-complex feature (or classifier) based and multi-resolution based algorithms. In the simple-to-complex based algorithm, simple features are efficiently extracted in a sub-image and used to quickly reject a lot of sub-images with high detection rate and possible high false-positive rate [17]. Only a small fraction of sub-images hard to be correctly classified by using the simple features are subject to complex features (or classifiers). The overall effect is that object detection is performed in higher speed. In the multi-resolution algorithm, the lower resolution features are used to reject the majority of negative sub-images at relatively low cost, leaving a relatively small number of sub-images to be processed by more computationally expensive higher resolution features [21].

The above-mentioned coarse-to-fine object detection algorithms mainly target at increasing detection speed. We call them speed-oriented algorithms. In this paper, we propose an energy-oriented algorithm. In many cases high-speed algorithm cost less energy than the low-speed one. Nevertheless, there is a large space to reduce energy cost without decreasing the speed. Note that less energy algorithm generally leads to higher computational efficiency while higher computational efficiency not always results in less energy. In this paper, we mainly focus on reducing the energy cost of the sliding-window based algorithm with linear classifier (such as linear SVM). After image features are extracted in a sub-image, linear classifier gives a decision based on the value of inner product between the feature vector and the trained weight vector. It is obvious that the inner product is the main computation operation that cost much power and energy. Making use of the high correlation between neighboring sub-images, we propose a method that can remarkably decrease the number of inner product.

The rest of the paper is organized as follows. In Section 2, we briefly describe the traditional sliding-window based object detection method. The proposed algorithm is presented in Section 3. Experimental results are reported in Section 4. Finally, Section 5 concludes our paper.

Section snippets

Traditional sliding window and inner product

In this section, we describe the process of sliding-window based object detection. The goal is to show that such a process is inner-product intensive and costs much electric energy. This motivates us to develop an energy-saving object detection algorithm by reducing the number of inner-product operations.

As well as many other object detection algorithms, sliding-window based algorithm consists of offline training stage and online detecting stage. In the training stage, one or a series of

Reject a bundle of sub-images by one inner product

It is observed from Section 2 that the inner product <w,xi> (i=1,2,...,N) operations use almost all the energy once the image feature vectors are extracted. To reduce the number of inner products and save energy consumption, we propose to reject a set of M neighboring sub-images with as small as one inner product. The assumption behind the proposed method is that there is a large possibility that neighboring sub-images have the same class label. Our method can use one inner product operation

Experimental results

In this section, we evaluate the proposed energy-saving object detection algorithm and the traditional sliding-window based algorithm. The proposed energy-saving algorithm can be extended to a fast version by employing the idea of existing fast object detection algorithms such as multi-resolution algorithm. So we did not directly compare the proposed method with the fast sliding-windows algorithms.

Conclusion

Traditional sliding-window based object detection with linear classifier is slow and electronic energy expensive because it is involved in a lot of inner-products between high-dimensional feature vector and weight vector of the learnt classifier. To deal with this problem, we have presented a technique that rejects most of the sub-images in a testing image using a small number of inner product operations. The idea behind the proposed method is that neighboring sub-images are also neighboring to

Acknowledgments

This work is supported in part by the National Science Foundation of China (Grant nos. 61271412, 61222109, 61125106, 60975001), Tianjin Research Program of Application Foundation and Advanced Technology (Contact no. 10JCYBJC07700) and the Open Project Program of the State Key Laboratory of Industrial Control Technology (No. ICT1212).

References (27)

  • B. Leibe et al.

    Robust object detection with interleaved categorization and segmentation

    International Journal of Computer Vision

    (2008)
  • A. Lehmann et al.

    Fast prism: branch and bound Hough transform for object class detection

    International Journal of Computer Vision

    (2011)
  • C. Lampert et al.

    Efficient subwindow search: a branch and bound framework for object localization

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2009)
  • Cited by (7)

    • Visual search reranking with RElevant Local Discriminant Analysis

      2016, Neurocomputing
      Citation Excerpt :

      In the future, we will focus on designing algorithms only using “very relevant” images and employing object detection techniques [38–42].

    • Describing and learning of related parts based on latent structural model in big data

      2016, Neurocomputing
      Citation Excerpt :

      Such hierarchical feature coding idea has also been shared by other researchers working on image classification [21]. Besides extracting local features for object description, many methods explore the importance of structural context information in an object, which lead to a batch of structural modeling methods in the literature [22–24]. Lee and Grauman proposed a graph based algorithm that models the interactions between familiar categories and unknown regions, which is used to discover novel categories in unlabeled images [25].

    • Distributed object detection with linear SVMs

      2014, IEEE Transactions on Cybernetics
    View all citing articles on Scopus
    View full text