Appearance-based active object recognition

doi:10.1016/S0262-8856(99)00075-X

Image and Vision Computing

Volume 18, Issue 9, June 2000, Pages 715-727

https://doi.org/10.1016/S0262-8856(99)00075-X Get rights and content

Abstract

We present an efficient method within an active vision framework for recognizing objects which are ambiguous from certain viewpoints. The system is allowed to reposition the camera to capture additional views and, therefore, to improve the classification result obtained from a single view. The approach uses an appearance-based object representation, namely the parametric eigenspace, and augments it by probability distributions. This enables us to cope with possible variations in the input images due to errors in the pre-processing chain or changing imaging conditions. Furthermore, the use of probability distributions gives us a gauge to perform view planning. Multiple observations lead to a significant increase in recognition rate. Action planning is shown to be of great use in reducing the number of images necessary to achieve a certain recognition performance when compared to a random strategy.

Introduction

Most computer vision systems found in the literature perform object recognition on the basis of the information gathered from a single image. Typically, a set of features is extracted and matched against object models stored in a database. Much research in computer vision has gone in the direction of finding features that are capable of discriminating objects [1]. However, this approach faces problems once the features available from a single view are simply not sufficient to determine the identity of the observed object. Such a case happens, for example, if there are objects in the database which look very similar from certain views or share a similar internal representation (ambiguous objects or object-data); a difficulty that is compounded when we have large object databases.

A solution to this problem is to utilize the information contained in multiple sensor observations. Active recognition provides the framework for collecting evidence until we obtain a sufficient level of confidence in one object hypothesis. The merits of this framework have already been recognized in various applications, ranging from land-use classification in remote-sensing [16] to object recognition [2], [3], [4], [15], [18], [19].

Active recognition accumulates evidence collected from a multitude of sensor observations. The system has to provide tentative object hypotheses for each single view.¹ Combining observations over a sequence of active steps moves the burden of object recognition slightly away from the process used to recognize a single view to the processes responsible for integrating the classification results of multiple views and for planning the next action.

In active recognition we have a few major modules whose efficiency is decisive for the overall performance (see also Fig. 1):

•
The object recognition system (classifier) itself.
•
The fusion task, combining hypotheses obtained at each active step.
•
The planning and termination procedures.

Each of these modules can be realized in a variety of different ways. This article establishes a specific coherent algorithm for the implementation of each of the necessary steps in active recognition. The system uses a modified version of Murase and Nayar's [12] appearance-based object recognition system to provide object classifications for a single view and augments it by active recognition components. Murase and Nayar's method was chosen because it does not only result in object classifications but also gives reasonable pose estimations (a prerequisite for active object recognition). It should be emphasized, however, that the presented work is not limited to the eigenspace recognition approach. The algorithm can also be applied to other view-based object recognition techniques that rely on unary, numerical feature spaces to represent objects.

Section snippets

Related research

Previous work in planning sensing strategies may be divided into off-line and on-line approaches [21]. Murase and Nayar, for example, have presented an approach for off-line illumination planning in object recognition by searching for regions in eigenspace where object-manifolds are best separated [11]. A conceptually similar but methodically more sophisticated strategy for on-line active view-planning will be presented below.

The other area of research is concerned with choosing on-line a

Object recognition in parametric eigenspace

Appearance-based approaches to object recognition, and especially the eigenspace method, have experienced a renewed interest in the computer vision community due to their ability to handle combined effects of shape, pose, reflection properties and illumination [12], [13], [22]. Furthermore, appearance-based object representations can be obtained through an automatic learning procedure and do not require the explicit specification of object models. Efficient algorithms are available to extend

Probability distributions in eigenspace

Before going on to discuss active fusion in the context of eigenspace object recognition we extend Murase and Nayar's concept of manifolds by introducing probability densities in eigenspace.⁴ Let us assume that we have constructed an eigenspace of all considered objects. We denote by $p(g |o_{i},ϕ_{j})$ the likelihood of ending up at point g in the eigenspace when projecting an image

Active object recognition

Active steps in object recognition will lead to striking improvements if the object database contains objects that share similar views. The key process to disambiguate such objects is a planned movement of the camera to a new viewpoint from which the objects appear distinct. We will tackle this problem now within the framework of eigenspace-based object recognition. Nevertheless, the following discussion on active object recognition is highly independent of the employed feature space. In order

The complexity of the algorithm

In the following we denote by n_o the number of objects, by n_ϕ the number of possible discrete manifold parameters (total number of viewpoints) and by n_f the number of degrees of freedom of the setup. Since n_ϕ depends exponentially on the number of degrees of freedom we introduce n_v, the mean number of views per degree of freedom, such that n_ϕ=n_v^n_f. Finally, let us denote by n_a the average number of possible actions. If all movements are allowed we will usually have n_a=n_ϕ.

Before starting the

Continuous values for the pose estimation

The fundamental likelihoods $p(g |o_{i},ϕ_{j})$ are based upon learned sample images for a set of discrete pose parameters ϕ_j, with j=1…n_ϕ. It is possible to deal with intermediate poses if also images from views ϕ_j±Δϕ_j are used for the construction of $p(g |o_{i},ϕ_{j}).$ However, the system in its current form is limited to final pose estimations P(ϕ_j|o_i,I₁,…,I_n) at the accuracy of the trained subdivisions.

For some applications one is not only interested in recognizing the object but also in estimating its

Experiments

An active vision system has been built that allows for a variety of different movements (see Fig. 5). In the experiments to be described below, the system changes the vertical position of the camera, tilt, and the orientation of the turntable.

Conclusions

We have presented an active object recognition system for single-object scenes. Depending on the uncertainty in the current object classification the recognition module acquires new sensor measurements in a planned manner until the confidence in a certain hypothesis obtains a pre-defined level or another termination criterion is reached. The well-known object recognition approach using eigenspace representations was augmented by probability distributions in order to capture possible variations

References (23)

S. Chandrasekaran et al.
An eigenspace update algorithm for image analysis
Graphical Models and Image Processing
(1997)
S. Kovac̆ic̆ et al.
Planning sequences of views for 3-D object recognition and pose determination
Pattern Recognition
(1998)
A. Pinz et al.
Active fusion—a new method applied to remote sensing image interpretation
Pattern Recognition Letters (Special Issue on Soft Computing in Remote Sensing Data Analysis)
(1996)
L. Zhao et al.
Theoretical analysis of illumination in PCA-based vision systems
Pattern Recognition
(1999)
I. Biederman
Recognition-by-components: a theory of human image understanding
Psychological Review
(1987)
H. Borotschnig, Uncertain information fusion in active object recognition, Number 127 in “Schriftenreihe der OCG”: A....
H. Borotschnig et al.
Active object recognition in parametric eigenspace
Proceedings of the Ninth British Machine Vision Conference
(1998)
H. Borotschnig et al.
A comparison of probabilistic, possibilistic and evidence theoretic fusion schemes for active object recognition
Computing
(1999)
F.G. Callari et al.
Autonomous recognition: driven by ambiguity
Proceedings of the International Conference on Computer Vision and Pattern Recognition
(1996)
K.D. Gremban et al.
Planning multiple observations for object recognition
International Journal of Computer Vision
(1994)

S.A. Hutchinson et al.

Multisensor strategies using Dempster–Shafer belief accumulation

Cited by (89)

6D pose estimation with combined deep learning and 3D vision techniques for a fast and accurate object grasping
2021, Robotics and Autonomous Systems
Real-time robotic grasping, supporting a subsequent precise object-in-hand operation task, is a priority target towards highly advanced autonomous systems. However, such an algorithm which can perform sufficiently-accurate grasping with time efficiency is yet to be found. This paper proposes a novel method with a 2-stage approach that combines a fast 2D object recognition using a deep neural network and a subsequent accurate and fast 6D pose estimation based on Point Pair Feature framework to form a real-time 3D object recognition and grasping solution capable of multi-object class scenes. The proposed solution has a potential to perform robustly on real-time applications, requiring both efficiency and accuracy. In order to validate our method, we conducted extensive and thorough experiments involving laborious preparation of our own dataset. The experiment results show that the proposed method scores 97.37% accuracy in 5cm5deg metric and 99.37% in Average Distance metric. Experiment results have shown an overall 62% relative improvement (5cm5deg metric) and 52.48% (Average Distance metric) by using the proposed method. Moreover, the pose estimation execution also showed an average improvement of 47.6% in running time. Finally, to illustrate the overall efficiency of the system in real-time operations, a pick-and-place robotic experiment is conducted and has shown a convincing success rate with 90% of accuracy. This experiment video is available at https://sites.google.com/view/dl-ppf6dpose/.
Deep active object recognition by joint label and action prediction
2017, Computer Vision and Image Understanding
An active object recognition system has the advantage of acting in the environment to capture images that are more suited for training and lead to better performance at test time. In this paper, we utilize deep convolutional neural networks for active object recognition by simultaneously predicting the object label and the next action to be performed on the object with the aim of improving recognition performance. We treat active object recognition as a reinforcement learning problem and derive the cost function to train the network for joint prediction of the object label and the action. A generative model of object similarities based on the Dirichlet distribution is proposed and embedded in the network for encoding the state of the system. The training is carried out by simultaneously minimizing the label and action prediction errors using gradient descent. We empirically show that the proposed network is able to predict both the object label and the actions on GERMS, a dataset for active object recognition. We compare the test label prediction accuracy of the proposed model with Dirichlet and Naive Bayes state encoding. The results of experiments suggest that the proposed model equipped with Dirichlet state encoding is superior in performance, and selects images that lead to better training and higher accuracy of label prediction at test time.
CAD-based 3D objects recognition in monocular images for mobile augmented reality
2015, Computers and Graphics (Pergamon)
Citation Excerpt :
Approaches based on machine learning are very popular. The input image is compared with sampled 2D views of the object to determine the camera pose [3–5]. In offline stage, a triangulated Gaussian sphere is used to generate a huge number of 2D views of the 3D object.
In order to improve registration performance of mobile augmented reality initialization, a new CAD-based recognition method of 3D objects in monocular images is proposed in this paper. Instead of estimating camera pose directly, our method attempts to estimate 3D object pose in the camera coordinate system. The geometric correspondence between the object mass center and its projection on 2D camera image is used to estimate object position. To hypothesize possible azimuths of the object, the 3D CAD model is off-screen rendered with OpenGL ES at different azimuths under an assumed inclination constraint provided by the inertial sensors of mobile devices, and compared with input image by contours matching. Unlike most recognition methods based on machine learning and natural features detection, which require large databases of real images and heavy computational consumption, our method only needs the CAD model of the 3D object in offline stage, and is computationally suitable for mobile devices. This paper describes the details of the method and shows its effectiveness with experiments.
Adaptive RANSAC and extended region-growing algorithm for object recognition over remote-sensing images
2022, Multimedia Tools and Applications
6D pose estimation with combined deep learning and 3D vision techniques for a fast and accurate object grasping
2021, arXiv
Unified Optimization for Multiple Active Object Recognition Tasks with Feature Decision Tree
2021, Journal of Intelligent and Robotic Systems: Theory and Applications

View all citing articles on Scopus

^☆: We gratefully acknowledge support by the Austrian ‘Fonds zur Förderung der wissenschaftlichen Forschung’ under grant S7003, and the Austrian Ministry of Science (BMWV Gz. 601.574/2-IV/B/9/96).

View full text

Appearance-based active object recognition☆

Abstract

Introduction

Section snippets

Related research

Object recognition in parametric eigenspace

Probability distributions in eigenspace

Active object recognition

The complexity of the algorithm

Continuous values for the pose estimation

Experiments

Conclusions

Graphical Models and Image Processing

Pattern Recognition

Pattern Recognition Letters (Special Issue on Soft Computing in Remote Sensing Data Analysis)

Pattern Recognition

Recognition-by-components: a theory of human image understanding

Psychological Review

Active object recognition in parametric eigenspace

Proceedings of the Ninth British Machine Vision Conference

A comparison of probabilistic, possibilistic and evidence theoretic fusion schemes for active object recognition

Computing

Autonomous recognition: driven by ambiguity

Proceedings of the International Conference on Computer Vision and Pattern Recognition

Planning multiple observations for object recognition

International Journal of Computer Vision

Multisensor strategies using Dempster–Shafer belief accumulation