Canonical subsets of image features

https://doi.org/10.1016/j.cviu.2008.06.002Get rights and content

Abstract

Many object recognition and localization techniques utilize multiple levels of local representations. These local feature representations are common, and one way to improve the efficiency of algorithms that use them is to reduce the size of the local representations. There has been previous work on selecting subsets of image features, but the focus here is on a systematic study of the feature selection problem. We have developed a combinatorial characterization of the feature subset selection problem that leads to a general optimization framework. This framework optimizes multiple objectives and allows the encoding of global constraints. The features selected by this algorithm are able to achieve improved performance on the problem of object localization. We present a dataset of synthetic images, along with ground-truth information, which allows us to precisely measure and compare the performance of feature subset algorithms. Our experiments show that subsets of image features produced by our method, stable bounded canonical sets (SBCS), outperform subsets produced by K-Means clustering, GA, and threshold-based methods for the task of object localization under occlusion.

Introduction

Many computer vision tasks include feature detection as an integral step. The features detected in an image of an object form an abstract representation of the object which can be used by higher level vision processes such as localization and recognition. Recent trends in feature detection have been in the development of methods that can extract large numbers of rich features from a given image. One consequence of these trends has been the increased size of an object’s representation. Nevertheless, there are applications where reducing the size of an object’s representation is desirable, as shown by the work of Heisele et al. [1], Sivic and Zisserman [2], and Sun et al. [3].

Consider the case where hundreds of thousands of object model representations reside in a database. The problem is to determine which of the objects are present in input images and to localize them. Such applications are ripe for parallel processing or even custom hardware solutions. In these cases, the size of an object’s representation will become critical to the overall system performance or even its feasibility.

The size of an object representation is a combination of the number of features and the feature descriptor size. The size of the representation can be made smaller by reducing the size of the feature descriptor, by reducing the number of features, or both. Recent work by Ke and Sukthankar [4] has shown that feature descriptors can be reduced in size. Johnson and Herbert [5] also addressed the problem of descriptor compression and demonstrated that spin images could be compressed with principle component analysis. Our research focus, which is complementary to theirs, is on reducing the number of features.

We hypothesize that some of the detected features are more representative than others, and a subset of detected features is sufficient for vision tasks like object localization. In theory, for determining 3D positions and orientations of objects in 2D images, three disjoint co-visible features per object are sufficient [6], [7]. The number of necessary features is difficult to determine and depends on the specific problem instance and features. For example, some features such as those described by Lowe et al. [8], encode orientation and scale, which may allow localization with the matching of a single feature. In practice, however, more features are required since portions of the object in the target scene may be occluded or the data might be noisy. More importantly, more features are required because it is desirable to over-constrain the pose estimation problem so that accurate localization results can be obtained.

Rather than determining how many features are necessary or sufficient, we focus on how to best select a subset of image features given a redundant set. Our goal is to select a subset that is highly representative of the original feature set. In order to be robust in the presence of occlusion, a representative subset should be spatially distributed across all regions of the input image. Features from all regions of an object are desirable since it is not possible to predict which regions will be occluded. At the same time, the subset should consist of features that are similar to the others not in the subset, i.e., the subset should well represent the original set. The selected subset of features should also be as dissimilar as possible, as there is no point including similar features from a homogeneous region. To be robust in the presence of noise, the subset selection algorithm should be able to take into account the relative stability of a feature; that is, including more stable (repeatable) features in the representative subset will increase the robustness of the subset to noise [9].

One way of selecting a subset of features is to cluster the features and then select those features nearest the cluster centroids as representatives. Our view is that subset selection is a problem that is distinctly different from the clustering problem. Clustering seeks to partition the input set into groups of similar elements. Clustering methods such as K-Means [10] use distance measures to group similar elements. Conflicting goals must be encoded into the distance measure and global constraints are difficult to enforce because of the localized nature of the algorithms. That these concerns are difficult to remedy is not surprising, since the algorithms were not designed for subset selection.

Our approach to the feature subset selection problem is to formulate it within an optimization framework that combines the multiple objectives of spatial distribution, similarity, and stability. We refer to this subset as the stable bounded canonical set (SBCS). The SBCS consists of the most representative (canonical) features with maximum stability. Here, the stability of a feature is a quantitative measure of its response to the feature detector.

We formulate the problem of determining the SBCS as a nonlinear integer programming optimization and present algorithms to approximate it using quadratic and semidefinite programming. To evaluate the quality of the subsets generated by our algorithm, we study subsets of image features in the context of object localization under occlusion. To measure the performance of feature subset algorithms we present a dataset of synthetic images, along with ground-truth information, which enables the precise measurement of localization accuracy under occlusion. Our experiments show that subsets of image features produced by our method, stable bounded canonical sets (SBCS), outperform subsets produced by K-Means clustering,threshold, and GA-based methods for the task of object localization under occlusion.

The rest of the paper is organized as follows: In Section 2, we review related work. In Section 3, we present our algorithm, including problem formulation and quadratic integer formulation along with descriptions of how approximate solutions may be computed. Section 4 describes experiments used to evaluate the utility of our algorithm. Lastly, Section 5 presents our conclusions and future work.

Section snippets

Related work

In our work, we seek to find a representative subset of image features. This problem is closely related to the problem of representing a data set with a smaller, more compact, representation. Rate distortion theory, described by Cover and Thomas [11], represents an input sequence, X, as a set of reproduction points, X^, defined by a mapping f:XX^. The distortion between X and X^ is the average distance between the sequences using a given measure such as the Hamming distance. The reproduction

Stable bounded canonical sets

In this section, we present our algorithm for the stable bounded canonical set (SBCS) of features. Our algorithm for feature subset selection is based on a formulation of the SBCS problem in terms of a quadratic integer programming optimization. Many problems of this type are known to be intractable [36], but good approximations exist [24]. In what follows, we formulate the quadratic integer program and show how approximated solutions may be obtained using quadratic programming and semidefinite

Experiments

In this section, we present a number of experiments that test the performance of our SDP subset selection technique. The experiments presented in this section are in the context of object localization, the process of determining the position, orientation, and scale of a query object in a target scene. First, in Section 4.1, we describe the feature detectors that are used in the experiments. In Section 4.2, we describe the algorithms that are used for object localization. Next, in Sections 4.3

Conclusions

We have presented a method to reduce a large set of image features to a small subset, the SBCS, while preserving its descriptive power over the original image features. We derived an efficient approximation algorithm using semidefinite programming to compute the SBCS that is at least 0.878 of optimal [37]. In addition, we presented a quadratic programming (QP) approximation for SBCS as well as preliminary results which show that it performs well in practice. To demonstrate the utility of the

Acknowledgments

This work was funded in part by a grant from the Office of Naval Research (ONR-N00014-08-1-0925). The authors thank Tom Plick for his implementation of the GA algorithm of Morita et al. [60] and his helpful comments. The authors also thank Jianbo Shi and Sven Dickinson for their insightful comments.

References (61)

  • R. Kohavi et al.

    Wrappers for feature subset selection

    Artificial Intelligence

    (1997)
  • Y. Freund et al.

    A decision-theoretic generalization of on-line learning and an application to boosting

    Journal of Computer and System Sciences

    (1997)
  • B. Heisele et al.

    Feature reduction and hierarchy of classifiers for fast object detection in video images

  • J. Sivic, A. Zisserman, Video google: a text retrieval approach to object matching in videos, in: Proceedings of the...
  • Z. Sun, G. Bebis, R. Miller, Object detection using feature subset selection, in: Pattern Recognition, vol. 37,...
  • Y. Ke et al.

    PCA-SIFT: a more distinctive representation for local image descriptors

  • A.E. Johnson et al.

    Using spin images for efficient object recognition in cluttered 3 d scenes

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1999)
  • D. Huttenlocher, S. Ullman, Object recognition using alignment, in: Proceedings of the First International Conference...
  • D.P. Huttenlocher et al.

    Recognizing solid objects by alignment with an image

    International Journal of Computer Vision

    (1990)
  • D.G. Lowe, Object recognition from local scale-invariant features, in: Proceedings of the International Conference on...
  • C. Schmid et al.

    Evaluation of interest point detectors

    International Journal of Computer Vision

    (2000)
  • D.J. MacKay

    Information Theory, Inference, and Learning Algorithms

    (2003)
  • T. Cover et al.

    Elements of Information Theory: Rate Distortion Theory

    (1991)
  • R.A. Fisher

    The statistical utilization of multiple measurements

    Annals of Eugenics

    (1938)
  • H. Hotelling

    Simplified calculation of principal components

    Psychometrika

    (1936)
  • H. Liu et al.

    Feature transformation and subset selection

    IEEE Intelligent Systems

    (1998)
  • H. Liu, H. Motoda, L. Yu, Feature selection with selective sampling, in: Proceedings of the Nineteenth International...
  • D. Koller, M. Sahami, Toward optimal feature selection, in: International Conference on Machine Learning, 1996, pp....
  • P. Langley, Selection of relevant features in machine learning, in: AAAI Fall Symposium on Relevance, 1994, pp....
  • M.J. Owen, Simple canonical views, in: Proceedings, British Machine Vision Conference (BMVC’05), September,...
  • A. Lu, R. Maciejewski, D.S. Ebert, Volume composition using eye tracking data, in: IEEE-VGTC Symposium on...
  • C.M. Cyr, B. Kimia, 3d object recognition using shape similarity-based aspect graph, in: Proceedings of the 8th...
  • F. Alizadeh

    Interior point methods in semidefinite programming with applications to combinatorial optimization

    SIAM Journal on Optimization

    (1995)
  • M.X. Goemans, D.P. Williamson, 878-Approximation algorithms for max cut and max 2sat, in: Proceedings of the...
  • M.X. Goemans

    Semidefinite programming in combinatorial optimization

    Mathematical Programming

    (1997)
  • S. Mahajan et al.

    Derandomizing approximation algorithms based on semidefinite programming

    SIAM Journal on Computing

    (1999)
  • C. Schellewald et al.

    Subgraph matching with semidefinite programming

  • X. Bai, H. Yu, E.R. Hancock, Graph matching using spectral embedding and semidefinite programming, in: Proceedings,...
  • J. Keuchel et al.

    Hierarchical image segmentation based on semidefinite programming

  • Q. Zhu, J. Shi, Shape from shading: recognizing the mountains through a global view, in: CVPR 2006, vol. 2, June, 2006,...
  • Cited by (0)

    1

    Work performed while at Department of Computer Science, Drexel University.

    View full text