Grounding semantic categories in behavioral interactions: Experiments with 100 objects

doi:10.1016/j.robot.2012.10.007

Robotics and Autonomous Systems

Volume 62, Issue 5, May 2014, Pages 632-645

https://doi.org/10.1016/j.robot.2012.10.007 Get rights and content

Abstract

From an early stage in their development, human infants show a profound drive to explore the objects around them. Research in psychology has shown that this exploration is fundamental for learning the names of objects and object categories. To address this problem in robotics, this paper presents a behavior-grounded approach that enables a robot to recognize the semantic labels of objects using its own behavioral interaction with them. To test this method, our robot interacted with 100 different objects grouped according to 20 different object categories. The robot performed 10 different behaviors on them, while using three sensory modalities (vision, proprioception and audio) to detect any perceptual changes. The results show that the robot was able to use multiple sensorimotor contexts in order to recognize a large number of object categories. Furthermore, the category recognition model presented in this paper was able to identify sensorimotor contexts that can be used to detect specific categories. Most importantly, the robot’s model was able to reduce exploration time by half by dynamically selecting which exploratory behavior should be applied next when classifying a novel object.

Highlights

► We performed a large-scale object categorization experiment with 100 objects. ► The robot’s category recognition model learned to identify 20 semantic category labels. ► The robot used a diverse set of exploratory behaviors and sensory modalities to explore the objects. ► Active behavior selection reduced exploration time by half when classifying a novel object.

Introduction

Object categories are all around us—our homes and offices contain a vast multitude of objects that can be organized according to a diverse set of criteria ranging from form to function. A robot operating in human environments would undoubtedly have to assign category labels to novel objects because it is simply infeasible to preprogram it with knowledge about every individual object that it might encounter. For example, to clean a kitchen table, a robot has to recognize semantic object category labels such as silverware, dish, or trash before performing an appropriate action.

The ability to learn and utilize object category memberships is an important aspect of human intelligence and has been extensively studied in psychology [1]. A large number of experimental and observational studies have revealed that object category learning is also linked to our ability to acquire words [2], [3]. Researchers have postulated that, with a few labeled examples, humans at various stages of development are able to identify common features that define category memberships as well as distinctive features that relate members and non-members of a target category [4], [5]. Other lines of research have highlighted the importance of object exploration [6], [7], which is important for learning object categories since many object properties cannot always be detected by passive observation [8], [9].

Recently, several research groups have started to explore how robots can learn object category labels that can be generalized to novel objects [10], [11], [12], [13], [14]. Most studies have examined the problem exclusively in the visual domain or have used a relatively small number of objects and categories. To address these limitations, this paper proposes an approach to object categorization that enables a robot to acquire a large number of category labels from a large set of objects. This is achieved with the use of multiple behavioral interactions and multiple sensory modalities. To test our method, the robot in our experiment (see Fig. 1) explored 100 different objects classified into 20 distinct object categories using 10 different interactions (e.g., grasp, lift, tap, etc.) making this one of the largest object sets that a robot has physically interacted with.

Using features extracted from the visual, auditory, and proprioceptive sensory modalities, coupled with a machine learning classifier, the robot was able to achieve high recognition rates on a variety of household object categories (e.g., balls, cups, pop cans, etc.). The robot’s model was also able to identify which sensory modalities and behaviors are best for recognizing each category label. In addition, the robot was able to actively select the exploratory behavior that it should try next when classifying an object, which resulted in faster convergence of the model’s accuracy rates when compared to random behavior selection. Finally, the model was evaluated on whether it can detect if a novel object does not belong to any of the categories present in the robot’s training set.

Section snippets

Related work

Most object categorization methods in robotics fall into one of two broad categories: (1) unsupervised methods, in which objects are categorized using unsupervised machine learning algorithms (e.g., $k$ -Means, Hierarchical Clustering, etc.) and (2) supervised methods, in which a labeled set of objects is used to train a recognition model that can label new data points. Several lines of research have demonstrated methods that enable robots to autonomously form internal object categories based on

Robot and sensors

The experiments were performed with the upper-torso humanoid robot shown in Fig. 1. The robot has as its actuators two 7-DOF Barrett Whole Arm Manipulators (WAMs), each with an attached 3-finger Barrett Hand. Each WAM has built-in sensors that measure joint angles and torques at 500 Hz. An Audio-Technica U853AW cardioid microphone mounted in the robot’s head was used to capture auditory feedback at the standard 16-bit/44.1 kHz resolution and rate over a single channel. The robot’s right eye (a

Proprioceptive feature extraction

For each of the nine interactive behaviors shown in Fig. 4, proprioceptive features were extracted from the recorded joint torques from all 7 joints of the robot’s left arm. The torques were recorded at 500 Hz. The joint-torque record from each interaction was represented as a $R^{n \times 7}$ vector, where $n$ is the number of temporal samples recorded for each of the 7 joints. Histogram features were extracted from each joint-torque record by discretizing the series of torque values for each joint into 10

Notation

Let $B$ be the set of exploratory behaviors and let $C$ be the set of sensorimotor contexts such that each context $c \in C$ refers to a combination of a behavior and a sensory modality (e.g., drop-audio, look-color, etc.). In our case, 9 behaviors (all except look) produced 3 types of feedback: auditory, optical flow, and proprioceptive feedback from the robot’s arm. SURF features were extracted during all 10 behaviors. In addition, color features were extracted during the look behavior. Finally, the

Category recognition using a single behavior

The first experiment evaluated the performance of the robot’s recognition models for each of the 39 possible sensorimotor contexts. Table 1, Table 2 show the accuracy rates for every viable combination of behavior and sensory modality.⁴

Conclusion and future work

The ability to classify objects into categories is a pre-requisite for intelligent manipulation in human environments. To solve a wide variety of household tasks–from sorting objects on a table, to cleaning a kitchen, to taking out the trash–a robot must be able to recognize the semantic category labels of novel objects in its environment. This paper addressed the problem of object category recognition by presenting an approach that enables a robot to acquire a rich sensorimotor experience with

Jivko Sinapov received the B.S. degree in computer science from the University of Rochester, NY, in 2005. He is currently working towards a Ph.D. degree in computer science with the Developmental Robotics Laboratory, Iowa State University, Ames, IA. His current research interests include developmental robotics, robotic perception, manipulation, and machine learning.

References (64)

A. Fulkerson et al.
Words (but not tones) facilitate object categorization: evidence from 6- and 12-month-olds
Cognition
(2007)
K. Plunkett et al.
Labels can override perceptual categories in early infancy
Cognition
(2008)
R. Hammer et al.
The development of category learning strategies: what makes the difference?
Cognition
(2009)
R. Hammer et al.
Differential category learning processes: the neural basis of comparison-based learning and induction
NeuroImage
(2010)
M. Ernst et al.
Merging the senses into a robust percept
Trends in Cognitive Sciences
(2004)
D. Vernon
Cognitive vision: the case for embodied perception
Image and Vision Computing
(2008)
H. Bay et al.
Speeded-up robust features (surf)
Computer Vision and Image Understanding
(2008)
L. Lam et al.
Optimal combinations of pattern classifiers
Pattern Recognition Letters
(1995)
M. Boutell et al.
Learning multi-label scene classification
Pattern Recognition
(2004)
F. Ashby et al.
Human category learning
Psychology
(2005)

E.J. Gibson

Exploratory behavior in the development of perceiving, acting, and the acquiring of knowledge

Annual Review of Psychology

(1988)

T.G. Power

Play and Exploration in Children and Animals

(2000)

D. Lynott et al.

Modality exclusivity norms for 423 object properties

Behavior Research Methods

(2009)

L. Lopes, A. Chauhan, Scaling up category learning for language acquisition in human–robot interaction, in: Proceedings...

S. Griffith, J. Sinapov, M. Miller, A. Stoytchev, Toward interactive learning of object categories by a robot: a case...

Z. Marton, R. Rusu, D. Jain, U. Klank, M. Beetz, Probabilistic categorization of kitchen objects in table settings with...

J. Sinapov, A. Stoytchev, Object category recognition by a humanoid robot using behavior-grounded relational learning,...

A. Leonardis et al.

Learning hierarchical representations of object categories for robot vision

Robotics Research

(2011)

T. Nakamura, T. Nagai, N. Iwahashi, Multimodal object categorization by a robot, in: Proceedings of the IEEE/RSJ...

N. Dag, I. Atil, S. Kalkan, E. Sahin, Learning affordances for categorizing objects and their properties, in: Proc. of...

J. Sun et al.

Learning visual object categories for robot affordance prediction

The International Journal of Robotics Research

(2010)

J. Sinapov, A. Stoytchev, Detecting the functional similarities between tools using a hierarchical representation of...

R. Fergus et al.

A visual category filter for google images

Lecture notes in computer science

(2004)

J. Ponce

(2006)

A. Opelt et al.

Generic object recognition with boosting

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2006)

L. Lopes, A. Chauhan, Scaling up category learning for language acquisition in human–robot interaction, in: Proceedings...

K. Lai, D. Fox, 3D laser scan classification using web data and domain adaptation, in: Proc. of Robotics: Science and...

W. Wohlkinger, M. Vincze, 3D object classification for mobile robots in home-environments using web-data, in: IEEE 19th...

K. Lai, L. Bo, X. Ren, D. Fox, A large-scale hierarchical multi-view RGB-D object dataset, in: Proc. of the IEEE...

K. Lai, L. Bo, X. Ren, D. Fox, Sparse distance learning for object recognition combining RGB and depth information, in:...

E. Torres-Jara, L. Natale, P. Fitzpatrick, Tapping into touch, in: Proc. 5-th Intl. Workshop on Epigenetic Robotics,...

J. Sinapov, M. Weimer, A. Stoytchev, Interactive learning of the acoustic properties of household objects, in:...

Cited by (66)

Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis
2023, arXiv
MOSAIC: Learning Unified Multi-Sensory Object Property Representations for Robot Learning via Interactive Perception
2023, arXiv
Learning instance-level N-ary semantic knowledge at scale for robots operating in everyday environments
2023, Autonomous Robots
Multimodal embodied attribute learning by robots for object-centric action policies
2023, Autonomous Robots
Visuo-haptic object perception for robots: an overview
2023, Autonomous Robots
The Construction of Reality in an AI: A Review
2023, arXiv

View all citing articles on Scopus

Connor Schenck received the Bachelor’s degree in computer science from Iowa State University, Ames, IA, in 2011. He is currently working towards a Master’s degree in computer science with the Developmental Robotics Laboratory, Iowa State University, Ames, IA. His current research interests include artificial intelligence, machine learning, robotics, and developmental robotics.

Kerrick Staley is currently pursuing an undergraduate degree in computer engineering at Iowa State University. He is interested in machine learning, robotics, and human–computer interaction.

Vladimir Sukhoy received the Bachelor’s degree in applied mathematics from Donetsk National University, Donetsk, Ukraine, in 2004. He is currently working toward the Ph.D. degree in computer engineering with the Developmental Robotics Laboratory, Iowa State University, Ames, IA. His current research interests are in the areas of developmental robotics, human–computer interaction, computational perception, and machine learning.

Alexander Stoytchev received the M.S. and Ph.D. degrees in computer science from Georgia Institute of Technology, Atlanta, GA in 2001 and 2007, respectively. He is currently an Assistant Professor of Electrical and Computer Engineering and the Director of the Developmental Robotics Laboratory, Iowa State University, Ames, IA. His current research interests are in the areas of developmental robotics, autonomous robotics, computational perception, and machine learning.

View full text

Grounding semantic categories in behavioral interactions: Experiments with 100 objects

Abstract

Highlights

Introduction

Section snippets

Related work

Robot and sensors

Proprioceptive feature extraction

Notation

Category recognition using a single behavior

Conclusion and future work

Cognition

Cognition

Cognition

NeuroImage

Trends in Cognitive Sciences

Image and Vision Computing

Computer Vision and Image Understanding

Pattern Recognition Letters

Pattern Recognition

Human category learning

Psychology

Exploratory behavior in the development of perceiving, acting, and the acquiring of knowledge

Annual Review of Psychology

Play and Exploration in Children and Animals

Modality exclusivity norms for 423 object properties

Behavior Research Methods

Learning hierarchical representations of object categories for robot vision

Robotics Research

Learning visual object categories for robot affordance prediction

The International Journal of Robotics Research

A visual category filter for google images

Lecture notes in computer science

Generic object recognition with boosting

IEEE Transactions on Pattern Analysis and Machine Intelligence