3D object recognition from static 2D views using multiple coarse data channels

doi:10.1016/S0262-8856(98)00159-0

Image and Vision Computing

Volume 17, Issue 11, September 1999, Pages 845-858

https://doi.org/10.1016/S0262-8856(98)00159-0 Get rights and content

Abstract

A 3D object recognition system is described that employs novel multiresolution representation and coarse encoding of feature information. Modifications are bought to classic feature extraction methods by proposing the use of wavelet transform maxima for directing the actions of feature extraction modules. The reasons behind the use of a multi-channel architecture are described, together with the description of the feature extraction and coarse modules. The targeted field of application being automatic categorisation of natural objects, the proposed system is designed to run on ordinary hardware platforms and to process an input in a short timeframe. The system has been evaluated on a variety of 2D views of a set of 5 synthetic objects designed to present various degrees of similarity, as being rated by a panel of human subjects. Parallels between these ratings and the system’s behaviour are drawn. Additionally a small set of photomicrographs of fish larvae has been used to assess the system’s performance when presented with very similar, non-rigid shapes. For comparison, the parameters extracted from each image were fed into two categorisers, discriminant analysis and multilayer feedforward neural network with backpropagation of error. Experimental evidence is presented which demonstrates the efficacy of the methods. The satisfactory categorisation performances of the system are reported, and conclusions are drawn about the system’s behaviour.

Introduction

Computational studies of vision in the past decades have highlighted the complexity of processes involved in performing visual tasks and the inherent difficulty of building computational systems that perform 3D object recognition and scene comprehension. A series of computational studies and theories reviewed by Hildreth and Ullman [1] describe vision as a chain of processes that, based on the retinal image, yield increasingly complex representations of the visible world. Among the theories on visual information representation in biological systems (the work of Marr [2], Biederman [3], Ullman [4], [5] and Edelman and Weinshall [6]), at the present the ones discussing viewer-centred representation seen to have more experimental evidence in their support (Tarr and Pinker [7], Edelman and Bulthoff [8]). The experimental results reviewed by Edelman [9] contradict also the theories centred around the idea of representation with reconstruction [2]. Hence there was a gradual shift in vision research from theories postulating the necessity of using extremely detailed, often complete representations of the world (the work of Freuder, Tenenbaum and Barrow cited in [2]), towards more relaxed frameworks based on viewpoint-dependent encoding of views and storage of multiple views (as suggested by Ullman and Basri [5]).

Arriving at this fundamental problem of representing the visible world, an important task in the design of a recognition system is the selection of descriptors that would constitute the building blocks of the internal representation of the analysed objects. Still, the virtual impossibility of obtaining a universally valid set of features and categorisation criteria based on these has been pointed out by studies in the field of taxonomy. Sokal [10] highlighted the existence of individual differences in taxonomic judgement, since a group of human classifiers can arrive at correct categorisation of objects based on quite different sets of features considered to be salient.

In analysing images for object recognition purposes, researchers usually have tended to focus on object and image properties that are also salient to human enquiry. Thus texture descriptions [11], edge positions [12] and statistical descriptions of pixel densities [13] have all been used to segment images into their component parts. Categorisation follows, providing object recognition. Most of these methods rely on extracting very precise measurements of, for example, symmetries or shape description (as illustrated by methods proposed by Brady [14], Khotanzad and Liou [15]). None have proved reliable analysis tools for understanding natural images or images with noise and clutter obscuring the objects of interest. As an alternative, a method was developed by Ellis et al. [16] that draws on the concept of Ullman’s multiple visual routines [17]. The principle of operation is the registration at low resolution of multiple parameters that describe the object scene in an image. If many of these `coarse channels’ are analysed in concert a solution to the particular analysis may be found – one which may not be apparent when using high resolution data. This is similar in concept to finding a global minimum in a multidimensional descriptor space so often described in artificial neural network research (a good example being Rumelhart and McClelland’s work [18]). The coarse channel principle has been applied successfully to the automatic categorisation of 23 species of field collected marine plankton, in a system developed by Culverhouse et al. [19]. It is also applied here to the task of three dimensional object recognition.

This approach of non-exact feature description and low-resolution encoding of features also constitutes the central concept of other recently developed systems that do not necessarily employ multiple data channels. Bradsky and Grossberg [20] describe a system that uses in its preprocessing stage an array of Gaussian receptive fields in order to decrease the dimensionality of the data. The system developed by Mel [21] employs a large array of filters placed on the input image, the outputs being coarse coded as histograms for achieving viewpoint-invariance. In a conceptually related way, Schiele and Crowley [22], [23] have used multidimensional receptive field histograms characterising 2D views of objects in classification and in determination of favourable viewpoints for recognition. Edelman’s Chorus scheme [24] utilises a receptive field array that provides low-dimensional description of the input data. The main difference between these approaches and the authors’ system is that the attention of the system is directed by a module employing Mallat’s [25] multiresolution analysis (MRA) towards areas of the image that contain potentially relevant features for categorisation. Therefore it does not analyse the entire surface of the input image (e.g. by placing a large array of receptive fields on the image).

The proposed system constitutes an engineering solution, since the processing algorithms were designed to run on largely available hardware platforms and to perform analysis in a sufficiently short timeframe that makes it usable in laboratory conditions.

Section snippets

Overview of the system

The recognition system has three components: (i) a multi-resolution feature extractor that uses wavelet filter banks, (ii) a coarse channel feature analyser and (iii) an object categoriser. Features are defined in this context as areas of high contrast or high curvature, the extraction of these being directed by low-resolution information, following work on visual inspection through eye tracking by Niemann et al. [26] and Rao et al. [27]. The spatial organisation of these features is analysed

The preprocessing and coarse coding methods

In this section the mathematics and algorithms behind the feature extraction and coarse coding methods are described, with emphasis on the novel way of representing and encoding the scale-space topology of wavelet transform’s local maxima.

Classification experiments

In order to evaluate the MRA/coarse-coded data channel image analyser, three test data sets were used. The results were fed into two categorisers for training and testing. An 8–object and a 5–object data set comprised computer–generated 2D views of 3D objects. The Aberdeen data set held multiple 2D views of natural images of fish larvae. Images in the Aberdeen set were typically of much poorer image quality than the first two sets of images. The 8–object data set was used to evaluate the θ

Conclusions

A there dimensional object recogniser was presented that operates on coarse coded data obtained from multi-resolution analysis of 2D views of 3D objects. The system has been tested on a variety of synthetic objects, some of which present self occlusion during rotation. The implemented feature extraction and coarse coding techniques led to good results in classifying views of similar synthetic 3D objects, in conditions of wide variations of viewpoint. Also, in the case of a difficult set of

Acknowledgements

The authors are grateful to Paul Rankine from the Marine Laboratory, Agriculture and Fisheries Department, The Scottish Office, Aberdeen for providing the set of photomicrographs of fish larvae.

References (47)

S. Ullman
Aligning pictorial descriptions: an approach to object recognition
Cognition
(1989)
M. Tarr et al.
Mental rotation and orientation-dependence in shape recognition
Cognitive Psychology
(1989)
S. Edelman et al.
H., Orientation dependence in the recognition of familiar and novel views of 3D objects
Vision Research
(1992)
S. Edelman
Representation without reconstruction
CVGIP – Image Understanding
(1994)
M.M. Van Hulle et al.
A modular artificial neural network for texture processing
Neural Networks
(1993)
S. Ullman
Visual Routines
Cognition
(1984)
B. Schiele
Crowley J.L., Transinformation of object recognition and its application to viewpoint planning
Robotics and Autonomous Systems
(1997)
J.W. Hsieh et al.
Image registration using a new edge-based approach
Computer Vision and Image Understanding
(1997)
I. Cohen et al.
Orthonormal shift-invariant wavelet packet decomposition and representation
Signal Processing
(1997)
Hildreth, E.C., Ullman, S., The computational study of vision. In: Posner M. I., ed., Foundations of Cognitive Science,...

Marr, D., Vision – A computational investigation into the human representation and processing of visual information,...

I. Biederman

Recognition by components: a theory of human image understanding

Psychol. Review

(1987)

S. Ullman et al.

Recognition by linear combinations of models

IEEE Transactions on Pattern Analysis and Machine Intelligence

(1991)

S. Edelman et al.

A Self-organising multiple-view representation of 3D objects

Biological Cybernetics

(1991)

R.R. Sokal

Classification: purposes, principles, progress, prospects

Science

(1974)

J. Canny

A Computational approach to edge detection

IEEE Transactions on Pattern Analysis and Machine Intelligence

(1986)

J.D. Helterbrand et al.

A statistical approach to identifying closed object boundaries in images

Advances in applied probability

(1994)

Brady, M., Representing shape. In: Parallel architectures and computer vision workshop, Somerville college, Oxford,...

A. Khotanzad et al.

Recognition and pose estimation of unoccluded three-dimensional objects from a two-dimensional perspective view by banks of neural networks

IEEE Transactions on Neural Networks

(1996)

Ellis, R., Simpson, R., Culverhouse, P.F., Parisini, T., Williams, R., Reguera, B., Moore, B., Lower, D., Expert visual...

Rumelhart, D.E., McClelland, J.L., Parallel distributed processing: Explorations in the microstructure of cognition...

P.F. Culverhouse et al.

Automatic categorisation of 23 species of Dinoflagellate by artificial neural network

Mar. Ecol. Prog. Ser.

(1996)

G. Bradsky et al.

Fast-learning VIEWNET architectures for recognizing three-dimensional objects from multiple two-dimensional views

Neural Networks

(1995)

Cited by (16)

A diffusion wavelet approach for 3-D model matching
2009, CAD Computer Aided Design
Citation Excerpt :
In this paper, we present methods using 3D shapes based on mesh models which are widely used in computer graphics and CAD applications. There is significant amount of similar works in the area of computer vision (e.g., [1–4]), which generally infer the information about a 3D-shape from one or more frames of 2D-images. This is different from the proposal presented in this paper, as we deal directly with 3D-objects represented as polygonal meshes by which suitable descriptors are extracted.
This paper proposes a new 3D shape retrieval approach based on diffusion wavelets which generalize wavelet analysis and associated signal processing techniques to functions on manifolds and graphs. Unlike current works on 3D matching, which are based either on the topological information of the model or its scatter point distribution information, this approach uses both information for more effective matching. Diffusion wavelets enable both global and local analyses on graphs, and can capture the topology of a surface with the diffusion map of its mesh representation. As a result, both multi-scale properties of the 3D geometric model and the topology among the meshes can be extracted for use in 3D geometric model retrieval. Tests using 3D benchmarks demonstrate that the approach based on diffusion wavelets is effective and performs better than those by spherical wavelet and spherical harmonics in 3D model matching.
Human and machine factors in algae monitoring performance
2007, Ecological Informatics
We all take our visual systems for granted, and often assume we are always ‘near perfect’ observers. This is not the case; expert visual recognition is complex and can be error prone. Starting with examples that define the problem I will explore some of the issues of recognition where expert judgements are required.
In addition to ‘expert’ effects, there are a number of cognitive factors that can severely affect performance, including fatigue, boredom, recency effects, positivity bias and short-term memory effects. Experimental evidence of the impact of these on performance are presented and discussed.
The specimen identifications generated by experts are useful not only to ecology, but to researchers developing systems for automatic labelling of marine plankton. Comparisons of performance are presented, where human experts have been pitted against machines to label plankton. Consensus of opinion is important in reducing errors, yet it is the norm for experts to operate alone. The shortcomings of man and machines engaged in plankton recognition are reviewed and the future of automation is assessed.
Image Analysis and Computer Vision: 1999
2000, Computer Vision and Image Understanding
This paper presents a bibliography of nearly 1700 references related to computer vision and image analysis, arranged by subject matter. The topics covered include computational techniques; feature detection and segmentation; image and scene analysis; two-dimensional shape; pattern; color and texture; matching and stereo; 2 $12$ -dimensional recovery and analysis; three-dimensional shape; and motion. A few references are also given on related topics, including geometry and graphics, compression and processing, sensors and optics, visual perception, neural networks, artificial intelligence and pattern recognition, as well as on applications.
Contour based split and merge segmentation and pre-classification of zooplankton in very large images
2014, VISAPP 2014 - Proceedings of the 9th International Conference on Computer Vision Theory and Applications
An empirical assessment of the consistency of taxonomic identifications
2014, Marine Biology Research
Automated image processing in marine biology
2013, Imaging Marine Life: Macrophotography and Microscopy Approaches for Marine Biology

View all citing articles on Scopus

View full text

3D object recognition from static 2D views using multiple coarse data channels

Abstract

Introduction

Section snippets

Overview of the system

The preprocessing and coarse coding methods

Classification experiments

Conclusions

Acknowledgements

Cognition

Cognitive Psychology

Vision Research

CVGIP – Image Understanding

Neural Networks

Cognition

Robotics and Autonomous Systems

Computer Vision and Image Understanding

Signal Processing

Recognition by components: a theory of human image understanding

Psychol. Review

Recognition by linear combinations of models

IEEE Transactions on Pattern Analysis and Machine Intelligence

A Self-organising multiple-view representation of 3D objects

Biological Cybernetics

Classification: purposes, principles, progress, prospects

Science

A Computational approach to edge detection

IEEE Transactions on Pattern Analysis and Machine Intelligence

A statistical approach to identifying closed object boundaries in images

Advances in applied probability

Recognition and pose estimation of unoccluded three-dimensional objects from a two-dimensional perspective view by banks of neural networks

IEEE Transactions on Neural Networks

Automatic categorisation of 23 species of Dinoflagellate by artificial neural network

Mar. Ecol. Prog. Ser.

Fast-learning VIEWNET architectures for recognizing three-dimensional objects from multiple two-dimensional views

Neural Networks