Towards a unified visual framework in a binocular active robot vision system

https://doi.org/10.1016/j.robot.2009.08.005Get rights and content

Abstract

This paper presents the results of an investigation and pilot study into an active binocular vision system that combines binocular vergence, object recognition and attention control in a unified framework. The prototype developed is capable of identifying, targeting, verging on and recognising objects in a cluttered scene without the need for calibration or other knowledge of the camera geometry. This is achieved by implementing all image analysis in a symbolic space without creating explicit pixel-space maps. The system structure is based on the ‘searchlight metaphor’ of biological systems. We present results of an investigation that yield a maximum vergence error of ∼6.5 pixels, while ∼85% of known objects were recognised in five different cluttered scenes. Finally a ‘stepping-stone’ visual search strategy was demonstrated, taking a total of 40 saccades to find two known objects in the workspace, neither of which appeared simultaneously within the field of view resulting from any individual saccade.

Introduction

The recent maturation of digital imaging hardware and the continual advancement of image processing and analysis techniques have vastly improved the potential for the application of computer vision in real-world robotics systems. Furthermore, binocular robotic vision has an advantage over monocular vision in potentially being able to compute range maps (i.e. distance fields to visible surfaces) by decoding the local parallaxes between captured stereo-pairs. Binocular imaging can also be used in object recognition to provide more information and therefore generate stronger object presence/identity hypotheses than would be possible with monocular vision alone. The development of an active vision control mechanism for a binocular camera system featuring object recognition and automated visual field exploration has potential applications in autonomous roving vehicles, automatic surveillance systems and military or clinical telepresence.

In this paper we present a system that integrates visual attention, vergence, gaze control and object recognition based on point matches extracted by means of the Scale Invariant Feature Transform [1] (SIFT). The system as devised provides an efficient means for controlling an active binocular robot head by integrating low-level and high-level visual components in a uncomplicated and unified framework. Our vision system performs the key robotics task of detecting, classifying and locating known 3D objects that may be partially occluded, within a cluttered scene comprising both known and unknown objects. The essential structure of the binocular active robot vision system we have developed is sufficiently general to allow it to be readily adapted within different robotics contexts.

This paper is organised as follows. In Section 2, we describe related work and the motivation that led us to design this particular system. We then describe the design of the vergence, object recognition and gaze control systems, in Sections 3 Vergence, 4 Object recognition, 5 Gaze control, respectively. Finally, Section 6 contains a summary of the system validation, its results and contributions to the field of active vision research.

Section snippets

Related work and motivation

Several binocular robot heads have been developed in recent decades. For example, the “Richard the First” head [2] and the KTH robot head [3] were capable of mimicking human head motion. More recent robot heads include the LIRA head [4], where acoustic and visual stimuli are exploited to drive the head gaze; the Yorick head [5] or the Medusa head [6] where high-accuracy calibration, gaze control, control of vergence or real-time tracking with log-polar images were successfully demonstrated.

Vergence

The requirements of the vergence system specify that the cameras are driven such that they target the same real-world position. There are several different modalities of vergence conceivable, including those operating on the following contexts: when the system does not know a priori the contents of the scene; it is verging on a specific object, or salient item; the content of the scene is known a priori and one camera already targets the desired location.

Thus, the behaviour of the system is

Object recognition

The design of the object recognition system is a direct adaptation of the SIFT-based object recognition first described by Lowe [1]. The relevance of the design to this project is found in the means of integrating the object recognition system in the overall framework. For completeness, a brief overview of the design is given below.

The basic function of the object recognition procedure is to compare each input image captured by the binocular camera-pair to all pre-stored object examples held in

Gaze control

The design of the behavioural system aims at achieving gaze control driven by the vergence and object recognition functions (described in Sections 3 Vergence, 4 Object recognition) in order to undertake scene exploration. We have developed an attention system that operates purely in symbolic space represented by SIFT keypoints. This allows a single set of image features to be used for the entire heterogeneous set of tasks required.

A flow chart of the behaviour of the system can be seen in Fig. 4

Binocular camera robot head configuration

The physical robot head [24] used in this work comprises the following: one colour SONY digital camera DFW-X700 and a black and white SONY digital camera; XCD700, (each of 1024 × 768 pixels resolution) fitted with IEEE Firewire interfaces and four high-accuracy stepper motors and motor controllers (Physik Instrumente GmbH & Co.).

The hardware was interfaced to a Pentium 4 computer with a CPU clock speed of 2 GHz, with 2 Gb RAM running under Windows XP in the MATLAB programming environment.

Vergence system validation

As

Conclusions

The objective of the work reported here is to develop a binocular robot vision system capable of autonomous scene exploration, with the specific task of identifying and localising objects of known classes while maintaining binocular vergence. We have presented a system that demonstrates the application of several novel design principles in a functional integrated framework that essentially achieves the objectives defined in Section 2.

Adopting SIFT features as the underlying visual

Acknowledgements

G. Aragon-Camarasa is grateful for the support in this research by the Programme AlSSan, the European Union Programme of High Level Scholarships for Latin America, scholarship no. E07D400872MX and CONACYT-Mexico.

Gerado Aragon-Camarasa received his B.Sc. in Industrial Robotics Engineering at the National Polytechnic Institute (ESIME-IPN, Mexico City) in 2006. From 2004 to 2007 he was with the Professional Development in Automation Program at the Universidad Autonoma Metropolitana (Mexico), where he was involved in the control of processes, thermodynamics and geometric algebras. He is currently a second-year Ph.D. student in the Department of Computing Science at the University of Glasgow, supervised by

References (29)

  • Y. Yeshurun et al.

    Cepstral filtering on a columnar image architecture: A fast algorithm for binocular stereo segmentation

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1989)
  • T.A. Boyling et al.

    A fast foveated stereo matcher

  • T.A. Boyling, Active vision for autonomous 3d scene reconstruction, Ph.D. thesis, University of Glasgow,...
  • L. Balasuriya et al.

    An architecture for object-based saccade generation using a biologically inspired self-organised retina

  • Cited by (25)

    View all citing articles on Scopus

    Gerado Aragon-Camarasa received his B.Sc. in Industrial Robotics Engineering at the National Polytechnic Institute (ESIME-IPN, Mexico City) in 2006. From 2004 to 2007 he was with the Professional Development in Automation Program at the Universidad Autonoma Metropolitana (Mexico), where he was involved in the control of processes, thermodynamics and geometric algebras. He is currently a second-year Ph.D. student in the Department of Computing Science at the University of Glasgow, supervised by Dr. J. Paul Siebert. His current research interests embrace robot vision, object recognition, computational models of human vision and geometric algebras.

    Haitham Fattah is currently a research engineer and doctoral student for Codeplay Software Ltd. He graduated in 2007 from the University of Glasgow with an M.Sc. in computing science. During his undergraduate degree he specialised in computer vision, active vision systems and digital imaging, under the supervision of Dr. J. Paul Siebert. He undertook projects involving the development of a computerised test for colour vision deficiency and a SIFT-based binocular robotic vision control system.

    J. Paul Siebert received his B.Sc. and Ph.D. degrees from the Department of Electronics and Electrical Engineering at the University of Glasgow, in 1979 and 1985, respectively. He is currently a Reader in the Department of Computing Science, University of Glasgow and the Computer Vision & Graphics group leader. From 1991–1997 he was with the Turing Institute, Glasgow, developing photogrammetry-based 3D imaging systems for clinical applications, and he served as Chief Executive from 1994. Prior to this he held the post of Scientist at BBN Laboratories, Edinburgh, from 1988–1991. His research interests include 3D imaging systems and tools for human and animal surface anatomy assessment, and also robot vision systems based on biologically motivated principles. He has co-authored more than 90 international journal and conference papers in these areas.

    1

    Tel.: +44 0 141 330 3124.

    View full text