Training-less color object recognition for autonomous robotics

doi:10.1016/j.ins.2017.08.015

Information Sciences

Volumes 418–419, December 2017, Pages 218-241

https://doi.org/10.1016/j.ins.2017.08.015 Get rights and content

Abstract

This paper looks at the challenge of object recognition from the perspective of achieving the final goals of practical real-world autonomous robot-vision applications; that of identifying a target of interest in the robot’s field of view and properly localizing its position in preparation for the higher-level goal of tracking and navigation. A unified framework is introduced that combines a multidimensional feature histogram approach with a multiscale pyramid approach for training-less color object recognition and localization, with direct application to autonomous robotic agents. This framework addresses the high computational cost associated with multidimensional processing by deriving the Multidimensional “Laplacian Feature Histogram Pyramid”, a novel approach to a unified multidimensional-multiscale histogram representation. Furthermore, a Taylor series formulation is employed to combine the multiscale levels of the multidimensional Laplacian feature histogram pyramid into one efficient multidimensional-multiscale “Laplacian-Taylor Feature Histogram” for rapid object recognition and localization. The paper describes the criteria for target detection and localization by autonomous robots and how this newly developed framework fits these needs. Comparative results demonstrate the robustness of this recognition framework to noise and localization of target objects in cluttered scenes.

Introduction

Limitations of processing power in the computers of three decades ago restricted experimentation in computer vision to the analysis of static scenes, where vision had been commonly regarded as the problem of determining “what is where by looking”. Advances in real-time computer processing during the past couple of decades has resulted in an increased interest by vision researchers to study perception from the point of view of an active autonomous observer situated in a dynamic environment.

Approaching the vision problem from this point of view may change our perception of what the vision problem is and how it should be addressed. Computer vision research carried out from this perspective has been typically referred to in the literature as active, animate or purposive vision. Early researchers have argued that it constitutes a new paradigm for computer vision, and may lead to significant advances in robotics and to a better understanding of the vision problem in general [1], [2]. “This paradigm is as relevant today as it was decades ago and, with the state of modern computational tools, is poised to find new life in the robotic perception systems of the next decade”, as recently stated by Bajcsy et. al. [3].

Vision systems that implement the active vision paradigm have mechanisms that can actively control camera parameters such as orientation, focus, zoom, and vergence in response to external stimuli and the requirements of the task at hand. They may also have anthropomorphic features such as spatially variant (foveal) sensors [4]. The real-time constraints of active vision pose challenging goals for enabling a robot to interact successfully with its dynamic world. Two such goals are related to location (try to find a known object) and identification (try to identify an object whose location can be fixated) [2].

For active vision systems to be implemented in physical robots, simplification of the vision problem becomes a necessity, and the means for this is to develop more efficient algorithms. Histograms have been widely used for various image processing and computer vision applications; from simple image enhancement using histogram equalization [5], to color object recognition using three-dimensional (3D) color histogram object detection and localization [6], to sophisticated face classification and recognition [7].

Color histogram based algorithms for object recognition are attractive because of their simplicity, efficiency and robustness. Many image retrieval systems use color histograms among other cues (e.g. [8]) which is motivated by the fact that many images contain characteristic colors. However, one major drawback of the color histogram based method is its sensitivity to variations in lighting conditions such as the color and the intensity of the light source. Also, it may be difficult to classify many object classes based on color alone.

The proposed paradigm in this paper addresses effectively the recognition and localization problem by developing a unified framework that combines a multidimensional feature histogram approach with a multiscale pyramid approach. The high computational cost associated with multidimensional processing is also addressed by deriving the Multidimensional “Laplacian Feature Histogram Pyramid” (MD-LFHP), a novel approach to a unified multidimensional-multiscale histogram representation. Furthermore, a Taylor series formulation is employed to combine the multiscale levels of the MD-LFHP into one efficient multidimensional-multiscale Laplacian-Taylor Feature-Histogram (LT-FHist) representation for rapid training-less color object recognition and localization with direct application to autonomous robotic agents.

The rest of this paper is organized as follows. Section 2 presents prior work related to object recognition and localization. Section 3 briefly reviews the effects on multidimensional histograms when an image is represented at a multiscale level. The proposed Laplacian–Taylor Feature–Histogram (LT-FHist) approach for object recognition and localization and its underlying mathematical derivation is discussed in significant detail in Sections 4–6. Section 7 presents comparative experimental results and discussions. Finally, concluding remarks appear in Section 8.

Section snippets

Related work

The problem of object detection and recognition has been widely studied in the literature and various object descriptors were proposed claiming robustness to scale, rotation, and occlusion [9], [10], [11], [12], [13], [14], [15], [16]. Multi-dimensional and multi-scale image feature descriptors have been reported in the literature and used separately for object recognition, but a unified approach for the fusion of both approaches into a conglomerate feature descriptor has not been formulated as

Effects of multiscale on multidimensional histograms

The histogram of an image at a certain scale differs significantly from its histogram at other scales such that histograms of a multiscale representation of an image can be used as a unique characteristic that fully identifies this image. This has been clearly demonstrated by Hadjidemetriou et. al. in [20] and is clear from Fig. 1.

In [37], Hadjidemetriou et. al. define a multiscale histogram as the set of intensity histograms of an image at multiple resolutions. This definition thus corresponds

The proposed methodology

The proposed multidimensional multiscale LT-FHist approach for object recognition and localization is motivated by knowledge about our early visual processing. Gaussian derivatives have been widely used in computer vision and their underlying mathematical derivations are well understood [39]. Their popularity lies in the fact that they can model the response of neural cells in the primate cortex [40]. Additionally, these derivatives can be “steered” to arbitrary orientations to create a local

Theory behind multidimensional histogram matching

Swain and Ballard [6] developed a 3D histogram intersection and backprojection technique called color indexing that can efficiently identify and localize objects in a database in the presence of occlusion and over changes in viewpoint.

The proposed method in this work extends Swain’s (3D) histogram intersection and backprojection technique, to the multidimensional-multiscale case, and demonstrates that augmenting color features with multidimensional local oriented feature distributions can

A unified approach for the LT-FHist paradigm

As expressed in the previous section, the MDMS representation of the proposed paradigm will create a data size that may be prohibitively expensive in terms of computational complexity. Assuming that N-dimensional feature vectors are generated per image pixel with an image resolution of R × C, and L-levels of a multiscale representation, the computational complexity of indexing an N-dimensional-L-scaled histogram generated from this volume of data is O(R × C × N × L). Thus, a more efficient

Performance analysis

To enable an autonomous robot to interact successfully with its dynamic world, the active vision system on board must contend with some challenging goals. Two such goals are related to location (try to find a known object) and identification (try to identify an object whose location can be fixated). In this paper an efficient vision paradigm is proposed that tries to effectively come closer to reaching these goals.

The performance analysis in this section will thus be bound by the practicality

Conclusion

This paper has developed a unified multidimensional-multiscale feature histogram framework for effective color object recognition and localization, which is directed towards autonomous mobile robot-vision applications. This framework has addressed the high computational cost associated with multidimensional histogram processing by deriving the multidimensional Laplacian feature histogram pyramid representation together with a Taylor series formulation to combine the multiscale levels of the

Acknowledgment

The author would like to thank the anonymous reviewers for their valuable suggestions that contributed to the overall improvement of the original manuscript. This work was funded by the College of Graduate Studies & Research and by the Research Institute of Sciences & Engineering at the University of Sharjah.

References (50)

H. Bay et al.
Speeded-up robust features (surf)
Comput. Vision Image Understanding
(2008)
U. Mahbub et al.
A template matching approach of one-shot-learning gesture recognition
Pattern Recognit. Lett.
(2013)
H. Jiang et al.
Robust one-shot facial expression recognition with sunglasses
Int. J. Mach. Learn.Comput.
(2016)
R.P. Rao et al.
An active vision architecture based on iconic representations
Artif. Intell.
(1995)
R. Bajcsy
Active perception
Proc. IEEE
(1988)
D. Terzopoulos et al.
Animat vision: active vision in artificial animals
Videre: J. Comput. Vision Res.
(1997)
R. Bajcsy et al.
Revisiting active perception
Auton. Robots
(2017)
R.P. de Figueiredo et al.
On the advantages of foveal mechanisms for active stereo systems in visual search tasks
Auton. Robots
(2017)
K. Castleman
Digital Image Processing
(1996)
M. Swain et al.
Color indexing
Int. J. Comput. Vision
(1991)

W. Zhang et al.

Local gabor binary pattern histogram sequence (lgbphs): a novel non-statistical model for face representation and recognition

Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on

(2005)

S. Belongie et al.

Color-and texture-based image segmentation using EM and its application to content-based image retrieval

Computer Vision, 1998. Sixth International Conference on

(1998)

F. Rothganger et al.

3D object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints

Int. J. Comput. Vision

(2006)

J.R. Uijlings et al.

Selective search for object recognition

Int. J. Comput. Vision

(2013)

X. Yang et al.

Local difference binary for ultrafast and distinctive feature description

IEEE Trans. Pattern Anal. Mach. Intell.

(2014)

F. Zhu et al.

Submodular object recognition

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(2014)

F. Zhu et al.

Weakly-supervised cross-domain dictionary learning for visual recognition

Int. J. Comput. Vision

(2014)

Y. Shu et al.

Discriminative transform of receptive field patterns for feature representation

Multimedia Tools Appl.

(2015)

Y. Huang et al.

Color object recognition via cross-domain learning on RGB-D images

Robotics and Automation (ICRA), 2016 IEEE International Conference on

(2016)

D. Ballard et al.

Object recognition using steerable filters at multiple scales

Proc. IEEE Workshop on Qualitative Vision, Los Alamitos, CA

(1993)

B. Schiele et al.

Recognition without correspondence using multidimensional receptive field histograms

Int. J. Comput. Vision

(2000)

P. Indyk et al.

Fast image retrieval via embeddings

3rd International Workshop on Statistical and Computational Theories of Vision

(2003)

E. Hadjidemetriou et al.

Multiresolution histograms and their use for recognition

Pattern Anal. Mach. Intell. IEEE Trans.

(2004)

D.G. Lowe

Distinctive image features from scale-invariant keypoints

Int. J. Comput. Vision

(2004)

N. Dalal et al.

Histograms of oriented gradients for human detection

Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on

(2005)

Cited by (6)

Softmax regression based deep sparse autoencoder network for facial emotion recognition in human-robot interaction
2018, Information Sciences
Citation Excerpt :
Meanwhile, the intelligent service system with emotional recognition ability becomes a hot topic in HRI. Facial expression recognition plays an important role in manifestations of recognizing and understanding human emotion by robots [14,16,23]. Traditional facial feature extraction algorithms such as Gabor wavelet transform [29], model method [2], and optical flow method [11], which are subject to a number of constraints just like face posture diversity and changeability, individual differences in facial structure and the levels of skin color, computer performance impose restrictions on the training speed, the impact on the external environment, e.g., light, shelter, and so on.
Deep neural network (DNN) has been used as a learning model for modeling the hierarchical architecture of human brain. However, DNN suffers from problems of learning efficiency and computational complexity. To address these problems, deep sparse autoencoder network (DSAN) is used for learning facial features, which considers the sparsity of hidden units for learning high-level structures. Meanwhile, Softmax regression (SR) is used to classify expression feature. In this paper, Softmax regression-based deep sparse autoencoder network (SRDSAN) is proposed to recognize facial emotion in human-robot interaction. It aims to handle large data in the output of deep learning by using SR, moreover, to overcome local extrema and gradient diffusion problems in the training process, the overall network weights are fine-tuned to reach the global optimum, which makes the entire depth of the neural network more robust, thereby enhancing the performance of facial emotion recognition. Results show that the average recognition accuracy of SRDSAN is higher than that of the SR and the convolutional neural network. The preliminarily application experiments are performed in the developing emotional social robot system (ESRS) with two mobile robots, where emotional social robot is able to recognize emotions such as happiness and angry.
Color Constancy Algorithm Based on Pyramid Pooling
2023, ACM International Conference Proceeding Series
Progressive Multi-Scale Feature Cascade Fusion Color Constancy Algorithm
2022, Guangxue Xuebao/Acta Optica Sinica
Color Constancy with Multi-Channel Confidence-Weighted Method
2021, Guangxue Xuebao/Acta Optica Sinica
Adaptive Color Correction and Contrast Enhancement Algorithm for Low Illumination Images
2019, Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics
SHORT: Segmented histogram technique for robust real-time object recognition
2019, Multimedia Tools and Applications

View full text

Training-less color object recognition for autonomous robotics

Abstract

Introduction

Section snippets

Related work

Effects of multiscale on multidimensional histograms

The proposed methodology

Theory behind multidimensional histogram matching

A unified approach for the LT-FHist paradigm

Performance analysis

Conclusion

Acknowledgment

Comput. Vision Image Understanding

Pattern Recognit. Lett.

Int. J. Mach. Learn.Comput.

Artif. Intell.

Active perception

Proc. IEEE

Animat vision: active vision in artificial animals

Videre: J. Comput. Vision Res.

Revisiting active perception

Auton. Robots

On the advantages of foveal mechanisms for active stereo systems in visual search tasks

Auton. Robots

Digital Image Processing

Color indexing

Int. J. Comput. Vision

Local gabor binary pattern histogram sequence (lgbphs): a novel non-statistical model for face representation and recognition

Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on

Color-and texture-based image segmentation using EM and its application to content-based image retrieval

Computer Vision, 1998. Sixth International Conference on

3D object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints

Int. J. Comput. Vision

Selective search for object recognition

Int. J. Comput. Vision

Local difference binary for ultrafast and distinctive feature description

IEEE Trans. Pattern Anal. Mach. Intell.

Submodular object recognition

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Weakly-supervised cross-domain dictionary learning for visual recognition

Int. J. Comput. Vision

Discriminative transform of receptive field patterns for feature representation

Multimedia Tools Appl.

Color object recognition via cross-domain learning on RGB-D images

Robotics and Automation (ICRA), 2016 IEEE International Conference on

Object recognition using steerable filters at multiple scales

Proc. IEEE Workshop on Qualitative Vision, Los Alamitos, CA

Recognition without correspondence using multidimensional receptive field histograms

Int. J. Comput. Vision

Fast image retrieval via embeddings

3rd International Workshop on Statistical and Computational Theories of Vision

Multiresolution histograms and their use for recognition

Pattern Anal. Mach. Intell. IEEE Trans.

Distinctive image features from scale-invariant keypoints

Int. J. Comput. Vision

Histograms of oriented gradients for human detection

Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on