Grounding semantic categories in behavioral interactions: Experiments with 100 objects
Highlights
► We performed a large-scale object categorization experiment with 100 objects. ► The robot’s category recognition model learned to identify 20 semantic category labels. ► The robot used a diverse set of exploratory behaviors and sensory modalities to explore the objects. ► Active behavior selection reduced exploration time by half when classifying a novel object.
Introduction
Object categories are all around us—our homes and offices contain a vast multitude of objects that can be organized according to a diverse set of criteria ranging from form to function. A robot operating in human environments would undoubtedly have to assign category labels to novel objects because it is simply infeasible to preprogram it with knowledge about every individual object that it might encounter. For example, to clean a kitchen table, a robot has to recognize semantic object category labels such as silverware, dish, or trash before performing an appropriate action.
The ability to learn and utilize object category memberships is an important aspect of human intelligence and has been extensively studied in psychology [1]. A large number of experimental and observational studies have revealed that object category learning is also linked to our ability to acquire words [2], [3]. Researchers have postulated that, with a few labeled examples, humans at various stages of development are able to identify common features that define category memberships as well as distinctive features that relate members and non-members of a target category [4], [5]. Other lines of research have highlighted the importance of object exploration [6], [7], which is important for learning object categories since many object properties cannot always be detected by passive observation [8], [9].
Recently, several research groups have started to explore how robots can learn object category labels that can be generalized to novel objects [10], [11], [12], [13], [14]. Most studies have examined the problem exclusively in the visual domain or have used a relatively small number of objects and categories. To address these limitations, this paper proposes an approach to object categorization that enables a robot to acquire a large number of category labels from a large set of objects. This is achieved with the use of multiple behavioral interactions and multiple sensory modalities. To test our method, the robot in our experiment (see Fig. 1) explored 100 different objects classified into 20 distinct object categories using 10 different interactions (e.g., grasp, lift, tap, etc.) making this one of the largest object sets that a robot has physically interacted with.
Using features extracted from the visual, auditory, and proprioceptive sensory modalities, coupled with a machine learning classifier, the robot was able to achieve high recognition rates on a variety of household object categories (e.g., balls, cups, pop cans, etc.). The robot’s model was also able to identify which sensory modalities and behaviors are best for recognizing each category label. In addition, the robot was able to actively select the exploratory behavior that it should try next when classifying an object, which resulted in faster convergence of the model’s accuracy rates when compared to random behavior selection. Finally, the model was evaluated on whether it can detect if a novel object does not belong to any of the categories present in the robot’s training set.
Section snippets
Related work
Most object categorization methods in robotics fall into one of two broad categories: (1) unsupervised methods, in which objects are categorized using unsupervised machine learning algorithms (e.g., -Means, Hierarchical Clustering, etc.) and (2) supervised methods, in which a labeled set of objects is used to train a recognition model that can label new data points. Several lines of research have demonstrated methods that enable robots to autonomously form internal object categories based on
Robot and sensors
The experiments were performed with the upper-torso humanoid robot shown in Fig. 1. The robot has as its actuators two 7-DOF Barrett Whole Arm Manipulators (WAMs), each with an attached 3-finger Barrett Hand. Each WAM has built-in sensors that measure joint angles and torques at 500 Hz. An Audio-Technica U853AW cardioid microphone mounted in the robot’s head was used to capture auditory feedback at the standard 16-bit/44.1 kHz resolution and rate over a single channel. The robot’s right eye (a
Proprioceptive feature extraction
For each of the nine interactive behaviors shown in Fig. 4, proprioceptive features were extracted from the recorded joint torques from all 7 joints of the robot’s left arm. The torques were recorded at 500 Hz. The joint-torque record from each interaction was represented as a vector, where is the number of temporal samples recorded for each of the 7 joints. Histogram features were extracted from each joint-torque record by discretizing the series of torque values for each joint into 10
Notation
Let be the set of exploratory behaviors and let be the set of sensorimotor contexts such that each context refers to a combination of a behavior and a sensory modality (e.g., drop-audio, look-color, etc.). In our case, 9 behaviors (all except look) produced 3 types of feedback: auditory, optical flow, and proprioceptive feedback from the robot’s arm. SURF features were extracted during all 10 behaviors. In addition, color features were extracted during the look behavior. Finally, the
Category recognition using a single behavior
The first experiment evaluated the performance of the robot’s recognition models for each of the 39 possible sensorimotor contexts. Table 1, Table 2 show the accuracy rates for every viable combination of behavior and sensory modality.4
Conclusion and future work
The ability to classify objects into categories is a pre-requisite for intelligent manipulation in human environments. To solve a wide variety of household tasks–from sorting objects on a table, to cleaning a kitchen, to taking out the trash–a robot must be able to recognize the semantic category labels of novel objects in its environment. This paper addressed the problem of object category recognition by presenting an approach that enables a robot to acquire a rich sensorimotor experience with
Jivko Sinapov received the B.S. degree in computer science from the University of Rochester, NY, in 2005. He is currently working towards a Ph.D. degree in computer science with the Developmental Robotics Laboratory, Iowa State University, Ames, IA. His current research interests include developmental robotics, robotic perception, manipulation, and machine learning.
References (64)
- et al.
Words (but not tones) facilitate object categorization: evidence from 6- and 12-month-olds
Cognition
(2007) - et al.
Labels can override perceptual categories in early infancy
Cognition
(2008) - et al.
The development of category learning strategies: what makes the difference?
Cognition
(2009) - et al.
Differential category learning processes: the neural basis of comparison-based learning and induction
NeuroImage
(2010) - et al.
Merging the senses into a robust percept
Trends in Cognitive Sciences
(2004) Cognitive vision: the case for embodied perception
Image and Vision Computing
(2008)- et al.
Speeded-up robust features (surf)
Computer Vision and Image Understanding
(2008) - et al.
Optimal combinations of pattern classifiers
Pattern Recognition Letters
(1995) - et al.
Learning multi-label scene classification
Pattern Recognition
(2004) - et al.
Human category learning
Psychology
(2005)
Exploratory behavior in the development of perceiving, acting, and the acquiring of knowledge
Annual Review of Psychology
Play and Exploration in Children and Animals
Modality exclusivity norms for 423 object properties
Behavior Research Methods
Learning hierarchical representations of object categories for robot vision
Robotics Research
Learning visual object categories for robot affordance prediction
The International Journal of Robotics Research
A visual category filter for google images
Lecture notes in computer science
Generic object recognition with boosting
IEEE Transactions on Pattern Analysis and Machine Intelligence
Cited by (66)
Multimodal embodied attribute learning by robots for object-centric action policies
2023, Autonomous RobotsVisuo-haptic object perception for robots: an overview
2023, Autonomous Robots
Jivko Sinapov received the B.S. degree in computer science from the University of Rochester, NY, in 2005. He is currently working towards a Ph.D. degree in computer science with the Developmental Robotics Laboratory, Iowa State University, Ames, IA. His current research interests include developmental robotics, robotic perception, manipulation, and machine learning.
Connor Schenck received the Bachelor’s degree in computer science from Iowa State University, Ames, IA, in 2011. He is currently working towards a Master’s degree in computer science with the Developmental Robotics Laboratory, Iowa State University, Ames, IA. His current research interests include artificial intelligence, machine learning, robotics, and developmental robotics.
Kerrick Staley is currently pursuing an undergraduate degree in computer engineering at Iowa State University. He is interested in machine learning, robotics, and human–computer interaction.
Vladimir Sukhoy received the Bachelor’s degree in applied mathematics from Donetsk National University, Donetsk, Ukraine, in 2004. He is currently working toward the Ph.D. degree in computer engineering with the Developmental Robotics Laboratory, Iowa State University, Ames, IA. His current research interests are in the areas of developmental robotics, human–computer interaction, computational perception, and machine learning.
Alexander Stoytchev received the M.S. and Ph.D. degrees in computer science from Georgia Institute of Technology, Atlanta, GA in 2001 and 2007, respectively. He is currently an Assistant Professor of Electrical and Computer Engineering and the Director of the Developmental Robotics Laboratory, Iowa State University, Ames, IA. His current research interests are in the areas of developmental robotics, autonomous robotics, computational perception, and machine learning.