Training-less color object recognition for autonomous robotics
Introduction
Limitations of processing power in the computers of three decades ago restricted experimentation in computer vision to the analysis of static scenes, where vision had been commonly regarded as the problem of determining “what is where by looking”. Advances in real-time computer processing during the past couple of decades has resulted in an increased interest by vision researchers to study perception from the point of view of an active autonomous observer situated in a dynamic environment.
Approaching the vision problem from this point of view may change our perception of what the vision problem is and how it should be addressed. Computer vision research carried out from this perspective has been typically referred to in the literature as active, animate or purposive vision. Early researchers have argued that it constitutes a new paradigm for computer vision, and may lead to significant advances in robotics and to a better understanding of the vision problem in general [1], [2]. “This paradigm is as relevant today as it was decades ago and, with the state of modern computational tools, is poised to find new life in the robotic perception systems of the next decade”, as recently stated by Bajcsy et. al. [3].
Vision systems that implement the active vision paradigm have mechanisms that can actively control camera parameters such as orientation, focus, zoom, and vergence in response to external stimuli and the requirements of the task at hand. They may also have anthropomorphic features such as spatially variant (foveal) sensors [4]. The real-time constraints of active vision pose challenging goals for enabling a robot to interact successfully with its dynamic world. Two such goals are related to location (try to find a known object) and identification (try to identify an object whose location can be fixated) [2].
For active vision systems to be implemented in physical robots, simplification of the vision problem becomes a necessity, and the means for this is to develop more efficient algorithms. Histograms have been widely used for various image processing and computer vision applications; from simple image enhancement using histogram equalization [5], to color object recognition using three-dimensional (3D) color histogram object detection and localization [6], to sophisticated face classification and recognition [7].
Color histogram based algorithms for object recognition are attractive because of their simplicity, efficiency and robustness. Many image retrieval systems use color histograms among other cues (e.g. [8]) which is motivated by the fact that many images contain characteristic colors. However, one major drawback of the color histogram based method is its sensitivity to variations in lighting conditions such as the color and the intensity of the light source. Also, it may be difficult to classify many object classes based on color alone.
The proposed paradigm in this paper addresses effectively the recognition and localization problem by developing a unified framework that combines a multidimensional feature histogram approach with a multiscale pyramid approach. The high computational cost associated with multidimensional processing is also addressed by deriving the Multidimensional “Laplacian Feature Histogram Pyramid” (MD-LFHP), a novel approach to a unified multidimensional-multiscale histogram representation. Furthermore, a Taylor series formulation is employed to combine the multiscale levels of the MD-LFHP into one efficient multidimensional-multiscale Laplacian-Taylor Feature-Histogram (LT-FHist) representation for rapid training-less color object recognition and localization with direct application to autonomous robotic agents.
The rest of this paper is organized as follows. Section 2 presents prior work related to object recognition and localization. Section 3 briefly reviews the effects on multidimensional histograms when an image is represented at a multiscale level. The proposed Laplacian–Taylor Feature–Histogram (LT-FHist) approach for object recognition and localization and its underlying mathematical derivation is discussed in significant detail in Sections 4–6. Section 7 presents comparative experimental results and discussions. Finally, concluding remarks appear in Section 8.
Section snippets
Related work
The problem of object detection and recognition has been widely studied in the literature and various object descriptors were proposed claiming robustness to scale, rotation, and occlusion [9], [10], [11], [12], [13], [14], [15], [16]. Multi-dimensional and multi-scale image feature descriptors have been reported in the literature and used separately for object recognition, but a unified approach for the fusion of both approaches into a conglomerate feature descriptor has not been formulated as
Effects of multiscale on multidimensional histograms
The histogram of an image at a certain scale differs significantly from its histogram at other scales such that histograms of a multiscale representation of an image can be used as a unique characteristic that fully identifies this image. This has been clearly demonstrated by Hadjidemetriou et. al. in [20] and is clear from Fig. 1.
In [37], Hadjidemetriou et. al. define a multiscale histogram as the set of intensity histograms of an image at multiple resolutions. This definition thus corresponds
The proposed methodology
The proposed multidimensional multiscale LT-FHist approach for object recognition and localization is motivated by knowledge about our early visual processing. Gaussian derivatives have been widely used in computer vision and their underlying mathematical derivations are well understood [39]. Their popularity lies in the fact that they can model the response of neural cells in the primate cortex [40]. Additionally, these derivatives can be “steered” to arbitrary orientations to create a local
Theory behind multidimensional histogram matching
Swain and Ballard [6] developed a 3D histogram intersection and backprojection technique called color indexing that can efficiently identify and localize objects in a database in the presence of occlusion and over changes in viewpoint.
The proposed method in this work extends Swain’s (3D) histogram intersection and backprojection technique, to the multidimensional-multiscale case, and demonstrates that augmenting color features with multidimensional local oriented feature distributions can
A unified approach for the LT-FHist paradigm
As expressed in the previous section, the MDMS representation of the proposed paradigm will create a data size that may be prohibitively expensive in terms of computational complexity. Assuming that N-dimensional feature vectors are generated per image pixel with an image resolution of R × C, and L-levels of a multiscale representation, the computational complexity of indexing an N-dimensional-L-scaled histogram generated from this volume of data is O(R × C × N × L). Thus, a more efficient
Performance analysis
To enable an autonomous robot to interact successfully with its dynamic world, the active vision system on board must contend with some challenging goals. Two such goals are related to location (try to find a known object) and identification (try to identify an object whose location can be fixated). In this paper an efficient vision paradigm is proposed that tries to effectively come closer to reaching these goals.
The performance analysis in this section will thus be bound by the practicality
Conclusion
This paper has developed a unified multidimensional-multiscale feature histogram framework for effective color object recognition and localization, which is directed towards autonomous mobile robot-vision applications. This framework has addressed the high computational cost associated with multidimensional histogram processing by deriving the multidimensional Laplacian feature histogram pyramid representation together with a Taylor series formulation to combine the multiscale levels of the
Acknowledgment
The author would like to thank the anonymous reviewers for their valuable suggestions that contributed to the overall improvement of the original manuscript. This work was funded by the College of Graduate Studies & Research and by the Research Institute of Sciences & Engineering at the University of Sharjah.
References (50)
- et al.
Speeded-up robust features (surf)
Comput. Vision Image Understanding
(2008) - et al.
A template matching approach of one-shot-learning gesture recognition
Pattern Recognit. Lett.
(2013) - et al.
Robust one-shot facial expression recognition with sunglasses
Int. J. Mach. Learn.Comput.
(2016) - et al.
An active vision architecture based on iconic representations
Artif. Intell.
(1995) Active perception
Proc. IEEE
(1988)- et al.
Animat vision: active vision in artificial animals
Videre: J. Comput. Vision Res.
(1997) - et al.
Revisiting active perception
Auton. Robots
(2017) - et al.
On the advantages of foveal mechanisms for active stereo systems in visual search tasks
Auton. Robots
(2017) Digital Image Processing
(1996)- et al.
Color indexing
Int. J. Comput. Vision
(1991)
Local gabor binary pattern histogram sequence (lgbphs): a novel non-statistical model for face representation and recognition
Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on
Color-and texture-based image segmentation using EM and its application to content-based image retrieval
Computer Vision, 1998. Sixth International Conference on
3D object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints
Int. J. Comput. Vision
Selective search for object recognition
Int. J. Comput. Vision
Local difference binary for ultrafast and distinctive feature description
IEEE Trans. Pattern Anal. Mach. Intell.
Submodular object recognition
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Weakly-supervised cross-domain dictionary learning for visual recognition
Int. J. Comput. Vision
Discriminative transform of receptive field patterns for feature representation
Multimedia Tools Appl.
Color object recognition via cross-domain learning on RGB-D images
Robotics and Automation (ICRA), 2016 IEEE International Conference on
Object recognition using steerable filters at multiple scales
Proc. IEEE Workshop on Qualitative Vision, Los Alamitos, CA
Recognition without correspondence using multidimensional receptive field histograms
Int. J. Comput. Vision
Fast image retrieval via embeddings
3rd International Workshop on Statistical and Computational Theories of Vision
Multiresolution histograms and their use for recognition
Pattern Anal. Mach. Intell. IEEE Trans.
Distinctive image features from scale-invariant keypoints
Int. J. Comput. Vision
Histograms of oriented gradients for human detection
Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on
Cited by (6)
Softmax regression based deep sparse autoencoder network for facial emotion recognition in human-robot interaction
2018, Information SciencesCitation Excerpt :Meanwhile, the intelligent service system with emotional recognition ability becomes a hot topic in HRI. Facial expression recognition plays an important role in manifestations of recognizing and understanding human emotion by robots [14,16,23]. Traditional facial feature extraction algorithms such as Gabor wavelet transform [29], model method [2], and optical flow method [11], which are subject to a number of constraints just like face posture diversity and changeability, individual differences in facial structure and the levels of skin color, computer performance impose restrictions on the training speed, the impact on the external environment, e.g., light, shelter, and so on.
Color Constancy Algorithm Based on Pyramid Pooling
2023, ACM International Conference Proceeding SeriesProgressive Multi-Scale Feature Cascade Fusion Color Constancy Algorithm
2022, Guangxue Xuebao/Acta Optica SinicaColor Constancy with Multi-Channel Confidence-Weighted Method
2021, Guangxue Xuebao/Acta Optica SinicaAdaptive Color Correction and Contrast Enhancement Algorithm for Low Illumination Images
2019, Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer GraphicsSHORT: Segmented histogram technique for robust real-time object recognition
2019, Multimedia Tools and Applications