Robotic learning of haptic adjectives through physical interaction

https://doi.org/10.1016/j.robot.2014.09.021Get rights and content

Highlights

  • We equipped a PR2 robot with a pair of BioTac tactile sensors.

  • Both the robot and human subjects blindly touched sixty diverse objects.

  • We calculated static and dynamic features from the haptic data felt by the robot.

  • A multi-kernel SVM was used to learn the meaning of twenty-five haptic adjectives.

  • The robot performed as well as the average human subject at labeling objects.

Abstract

To perform useful tasks in everyday human environments, robots must be able to both understand and communicate the sensations they experience during haptic interactions with objects. Toward this goal, we augmented the Willow Garage PR2 robot with a pair of SynTouch BioTac sensors to capture rich tactile signals during the execution of four exploratory procedures on 60 household objects. In a parallel experiment, human subjects blindly touched the same objects and selected binary haptic adjectives from a predetermined set of 25 labels. We developed several machine-learning algorithms to discover the meaning of each adjective from the robot’s sensory data. The most successful algorithms were those that intelligently combine static and dynamic components of the data recorded during all four exploratory procedures. The best of our approaches produced an average adjective classification F1 score of 0.77, a score higher than that of an average human subject.

Introduction

Manipulation of objects in the real world is a task that goes beyond locating and grasping items of interest. Objects have material properties that need to be properly identified before one can reliably handle them. For example, it is well known that a human attempting to lift an object off a table adjusts his or her grip force and subsequent hand movements based on the coefficient of static friction between their fingertips and the surfaces of the object  [1]. Practically speaking, slippery objects need to be grasped more firmly and must be moved less aggressively than sticky objects. When executing a manipulation plan, humans continually predict the tactile signals they will feel and compare their predictions with the actual sensations that occur to monitor their progress and correct any mistakes  [2]. To achieve the envisioned benefits of robotic manipulation in human environments  [3], robots must develop a similar level of mastery over physical interaction with unknown objects.

Beyond the necessary skill of manipulating everyday objects, robot helpers must also be able to interact smoothly with humans who have little or no technical training. Natural language is likely to be a comfortable communication modality for a wide range of potential users  [4], [5], [6]. Like human children, robots will need to be able to learn new words and concepts through direct observation of and interaction with the world. The task of perceptually-grounded language learning requires one to generalize from a small number of examples to deduce the underlying pattern that captures the meaning of the word being learned. Given the opportunity for robots to function as helpers, we are particularly motivated by the task of learning to describe how objects feel to touch, a challenging undertaking that requires clever physical interaction, rich haptic sensing, and robust machine learning techniques.

One valuable immediate application of a robot that can verbally describe what it feels would be to provide automated and standardized descriptions of physical products such as clothing, stationery, and hand-held electronics. In the age of Internet shopping, online consumers frequently purchase things without knowing what they will feel like to touch. Consumers are less likely to purchase products in this situation, particularly products that have a strong tactile component  [7]. One way to address this issue is to provide consumers with scores from industry experts who have rated these objects using metrics such as KES-FB and FAST  [8], which attempt to quantify the tactile properties that people prefer. These metrics are designed for internal product reviews and are not as useful for the average consumer as a detailed and unbiased verbal description of the product’s feel. Such a system would allow consumers to search directly for the haptic adjectives they would like a product to have, such as soft, smooth, and nice. Ideally, these labels could be learned automatically from direct physical examples and applied to new products in an impartial manner, a task that is perfect for robotic technology. Haptic adjective descriptions have previously been explored in humans  [9], but the use of a robot to perform this task is largely novel.

To investigate the feasibility of robotic learning of haptic adjectives through physical interaction, we created the system pictured in Fig. 1. We sought to enable the depicted robot to autonomously explore objects with its sensorized fingertips and report back an appropriate set of descriptive haptic adjectives. This goal was accomplished by conducting and analyzing two experiments: one in which the robot felt a wide variety of household objects, and another to discover which haptic adjectives humans used to describe these same objects. We developed new methods for processing the heterogeneous and multi-modal time-varying information generated by each interaction, and we tested several different techniques for learning the associations between the robot’s haptic signals and the human-generated haptic adjective labels.

Humans are capable of haptically recognizing familiar three-dimensional objects through direct contact in just one or two seconds with near 100% accuracy  [10]. Individuals accomplish this impressive feat by taking advantage of a rich array of tactile and kinesthetic cues including contact location, pressure, stretch, vibration, temperature, finger position, finger velocity, and muscle force  [10], [2]. We similarly believe that tactile cues are essential to enabling a robot to acquire language and execute manual tasks with high accuracy. Furthermore, the sense of touch is inherently interactive — how you move your hand significantly affects the tactile sensations you feel. Interestingly, humans adopt consistent “exploratory procedures” (EPs) when asked to make judgments about specific object properties  [11]. For example, the EP of lateral fingertip motion reveals surface texture, pressing shows its hardness, and static contact discloses temperature and thermal conductivity. We believe robots need a corresponding set of exploratory procedures to best interrogate the objects they encounter.

Many robotics researchers have leveraged insights about the human sense of touch to improve robotic manipulation. Early work by Okamura et al.  [12] presented robot fingers that roll and slide over the surface of an object to determine texture, ridges, and grooves. Romano et al.  [13] imitated human tactile sensing channels and reactions to enable a robot with simple tactile sensors (Willow Garage PR2) to pick up, move around, and set down unknown objects without dropping or crushing them. In a more perception-focused effort, Chitta et al.  [14] used the same robot to deduce the identity and contents of a set of beverage containers through touch alone. For the similar task of discriminating containers from non-containers, Griffith et al.  [15] demonstrated that the best classification results are achieved when the robot executes a large number of interactions on the target object while attending to diverse sources of sensory data including sight, sound, and touch. Similarly, Sinapov et al.  [16] had a custom robot equipped with a vibration-sensitive fingernail use various scratching motions to recognize and categorize everyday textures, with the best results coming from the use of several motions  [16]. Oddo et al. used a different novel tactile sensor to achieve good accuracy in discriminating surface roughness  [17]. More recently, Fishel and Loeb used a novel biomimetic sensor (SynTouch BioTac) to interact with a library of 117 everyday textures, achieving 95.4% accuracy in identification through the use of cleverly selected tactile features and Bayesian techniques for choosing the most useful movements to perform  [18]. Other valuable work on the robotic use of tactile sensors also tends to focus on recognizing particular object instances  [19], [20], a task that is related to but distinct from our goal of learning haptic adjectives.

The specific task of quantifying tactile sensations has been explored in the domain of product evaluation for items such as fabric  [8] and skin cream  [21]. To aid researchers in interpreting the results of opinion studies, there have been a few attempts to develop custom systems to quantify these sensations using machine-learning techniques, e.g.,  [22], [8]. However, these past approaches have typically used single sensor inputs, which cannot match the richness of human tactile sensation. More recently, Shao et al. showed that accurately evaluating the feel of different packaging materials requires many different channels of touch perception  [23]. The need for multi-modal sensing to automatically identify different textures was also noted in the work by Fishel and Loeb  [18].

Since the data from haptic sensors usually streams over time, time-series analysis has often been employed to extract information from signals recorded during physical interactions. For example, haptic signals collected from an artificial skin over a robot’s entire body were successfully clustered and categorized into human–robot interaction modalities  [24]. Similarly, a robot learned to segment tasks by classifying time-series data obtained from an accelerometer and a camera  [25]. Our own analysis of time-series data is based on and draws upon the literature in speech recognition, most notably the use of Hidden Markov Models [26].

This article builds on preliminary research we published as a short workshop paper  [27] and a conference paper  [28]. The research reported here significantly extends and refines our prior work, particularly by including more objects, running a formal human-subject study to obtain adjective labels, intelligently merging the traditional approach to feature extraction with our HMM-based approach, rigorously and consistently evaluating classifier performance, and thoroughly interpreting all results.

Section snippets

Robotic hardware

To best emulate what humans experience when exploring objects, we mounted state-of-the-art sensors capable of detecting a wide array of tactile signals onto a humanoid robotic platform. Specifically, we developed a method to custom-install two Syntouch BioTacs (biomimetic tactile sensors) in the gripper of a Willow Garage PR2 (Personal Robot 2). The modification was successfully performed on PR2s at both Penn and UC Berkeley. Full details on the integration and the software interface packages

Experimental setup

To understand and generalize the experience of touching objects, we need knowledge of both how an object feels and how to describe those sensations. A touch-sensitive robot can show us precisely through bits and bytes what it has sensed, but it has no means to describe what impression the experience has made. On the other hand, humans can describe their perception of an object in words, but they cannot precisely share what they felt.

We developed two parallel experiments to capture the

Robot

We extracted several relevant haptic signals from the bagfiles recorded during the robot’s interactions with the 60 PHAC-2 objects. Fig. 4 displays the PR2 and BioTac data from a single interaction with the object Blue Sponge. Coming from the robot, the signal Xg shows the gripper aperture (distance between the fingertips), where a reading of 0.1 m means the gripper is completely open. Ztf is the vertical position of the gripper with respect to the robot’s torso coordinate frame. The additional

Features

While the human adjective labels are simple to comprehend, the robot recorded a voluminous quantity of diverse data during each of its 600 trials, as exhibited in Fig. 4. Thus, we had to find good ways to select a small number of numerical values (features) to represent each channel and EP of the interaction for use in our machine learning algorithms. We approached feature selection from the two complementary viewpoints of (1) calculating static values using hand-crafted formulas and (2)

Training the classifiers

We trained classifiers to capture the static and dynamic nature of the data using the database of 90% training object splits for each adjective, holding back the 10% of objects designated for testing each adjective, as described in Section  4.3.

Results

All of the results reported below were obtained through testing on the reserved adjective-specific test sets, which were never seen during training, as described in Section  4.3. The training and testing were performed on Linux-based PC computers with Intel i7 processors with a single core speed of 3.8 GHz and total RAM of 16.0 GB. On these machines, a single object exploration can be classified with all adjectives in approximately 10 s. The metrics that we have selected are precision, recall,

Discussion

The presented results show that adjective classifiers trained on more heterogeneous data outperform classifiers trained on only a subset of the data when attempting to label objects that have never before been touched. Although this result is not new in the machine-learning community, it has profound implications in haptics and robotics in general.

The first source of heterogeneity comes from the different ways in which the robot physically explored the object. While certain EP-specific static

Conclusion and future work

We set out to create a robotic system capable of touching everyday objects and describing them with haptic adjectives. We performed an experiment to learn what words humans choose to describe a large set of selected objects, and we collected haptic data from a robot that touched these same objects ten times each. The richness of the signals collected from the BioTac sensors enabled us to perform a multi-modal analysis of object properties. Based on prior robotics research and knowledge of human

Acknowledgments

The authors thank J. Nappo for helping find many of the objects in Fig. 2. We thank N. Fitter for guiding us in the use of Correspondence Analysis. This work was supported by the Defense Advanced Research Projects Agency (DARPA) in the United States as part of Activity E within the Broad Operational Language Translation (BOLT) program, as well as by the University of Pennsylvania and the University of California, Berkeley.

Vivian Chu received the B.S. degree in Electrical Engineering and Computer Science from the University of California, Berkeley, in 2009 and the M.S.E. degree in Robotics from the University of Pennsylvania in 2013. She is currently a Ph.D. student at the Georgia Institute of Technology in Andrea L. Thomaz’s Socially Intelligent Machines Lab. From 2009 and 2011, Vivian worked on natural language processing (NLP) and intelligent information integration at IBM Research, Almaden. At Penn, she

References (37)

  • J.F. Gorostiza et al.

    End-user programming of a social robot by dialog

    Robot. Auton. Syst.

    (2011)
  • A.V. Citrin et al.

    Consumer need for tactile input: an Internet retailing challenge

    J. Business Res.

    (2003)
  • S.J. Lederman et al.

    Extracting object properties through haptic exploration

    Acta Psychol.

    (1993)
  • D. Picard et al.

    Perceptual dimensions of tactile textures

    Acta Psychol.

    (2003)
  • G. Cadoret et al.

    Friction, not texture, dictates grip forces used during object manipulation

    J. Neurophysiol.

    (1996)
  • R.S. Johansson et al.

    Coding and use of tactile signals from the fingertips in object manipulation tasks

    Nature Rev. Neurosci.

    (2009)
  • C.C. Kemp et al.

    Challenges for robot manipulation in human environments: developing robots that perform useful work in everyday settings

    IEEE Robot. Autom. Mag.

    (2007)
  • D.J. Brooks et al.

    Make It So: Continuous, Flexible Natural Language Interaction with an Autonomous Robot, AAAI Technical Report WS-12-07

    (2012)
  • X. Chen, J. Ji, J. Jiang, G. Jin, F. Wang, J. Xie, Developing high-level cognitive functions for service robots, in:...
  • N. Pan

    Quantification and evaluation of human tactile sense towards fabrics

    Int. J. Design Nature

    (2006)
  • M. Hollins et al.

    Individual differences in perceptual space for tactile textures: Evidence from multidimensional scaling

    Attention, Percept. Psychophys.

    (2000)
  • R.L. Klatzky et al.

    Identifying objects by touch: An “expert system”

    Perception and Psychophysics

    (1985)
  • A.M. Okamura, M.L. Turner, M.R. Cutkosky, Haptic exploration of objects with rolling and sliding, in: Proc. 1997 IEEE...
  • J.M. Romano et al.

    Human-inspired robotic grasp control with tactile sensing

    IEEE Trans. Robotics

    (2011)
  • S. Chitta et al.

    Tactile sensing for mobile manipulation

    IEEE Transactions on Robotics

    (2011)
  • S. Griffith et al.

    A behavior-grounded approach to forming object categories: Separating containers from noncontainers

    IEEE Trans. Autonom. Mental Develop.

    (2012)
  • J. Sinapov et al.

    Vibrotactile recognition and categorization of surfaces by a humanoid robot

    IEEE Transactions on Robotics

    (2011)
  • C. Oddo et al.

    Roughness encoding for discrimination of surfaces in artificial active-touch

    IEEE Transactions on Robotics

    (2011)
  • Cited by (117)

    • Fusion of tactile and visual information in deep learning models for object recognition

      2023, Information Fusion
      Citation Excerpt :

      al. developed a fusion framework named Deep Maximum Covariance Analysis (DMCA) to learn a joint latent space for the weakly paired tactile and visual data to perform the task of texture cloth recognition [34]. Haptic adjective analysis is important in understanding haptic interactions with different objects [35]. Gao et.

    View all citing articles on Scopus

    Vivian Chu received the B.S. degree in Electrical Engineering and Computer Science from the University of California, Berkeley, in 2009 and the M.S.E. degree in Robotics from the University of Pennsylvania in 2013. She is currently a Ph.D. student at the Georgia Institute of Technology in Andrea L. Thomaz’s Socially Intelligent Machines Lab. From 2009 and 2011, Vivian worked on natural language processing (NLP) and intelligent information integration at IBM Research, Almaden. At Penn, she worked in Katherine J. Kuchenbecker’s Haptics Research Group in the GRASP Lab. Her research interests include multi-modal sensor integration, NLP, and applying machine learning techniques for robotic learning in unstructured environments.

    Ian McMahon received the B.S. degree in Computer Engineering from the Pennsylvania State University in 2009 and the M.S.E. degree in Robotics from the University of Pennsylvania in 2012. During his masters degree, he worked in Katherine J. Kuchenbecker’s Haptics Research Group in the GRASP Lab. His research interests include haptic sensing and computer vision for use in mobile manipulation, as well as human–robot interaction. Ian is currently a Senior Developer Relations Engineer at Rethink Robotics.

    Lorenzo Riano received the first class honors master degree in Computer Science at the University of Napoli “Federico II”, Italy, and the Ph.D. degree in Robotics at the University of Palermo, Italy. He is a Research Engineer at Bosch Research and Technology Center. Previously he was a Research Scientist at the University of California, Berkeley, and before he was a Research Associate at the Intelligent Systems Research Centre, University of Ulster, UK. His main research interests are in robotics, machine learning, and computer vision.

    Craig G. McDonald received the B.S.E. degree in Mechanical Engineering and Applied Mechanics from the University of Pennsylvania in 2012. He is currently a Ph.D. student in Mechanical Engineering at Rice University. At Penn, he worked in Katherine J. Kuchenbecker’s Haptics Research Group in the GRASP Lab. At Rice, he is working in Marcia O’Malley’s MAHI Lab, as well as NASA Johnson Space Center’s Dexterous Robotics Lab, focusing on the design and control of robotic exoskeletons. His research interests also include haptic interfaces, robotic manipulation, and machine learning.

    Qin He received the B.S. degree in Mechanical Engineering from Shanghai Jiao Tong University in 2012, and the M.S.E. degree in Robotics at the University of Pennsylvania in 2013. During her master’s degree, she worked in Katherine J. Kuchenbecker’s Haptics Research Group in the GRASP Lab. Currently she is a research assistant in Daniel Lee’s group in the GRASP Lab, working on the DARPA Robotics Challenge. Her research interests include haptic sensing and learning algorithms in robotics.

    Jorge Martinez Perez-Tejada is a B.S.E. candidate with majors in Electrical Engineering and Cognitive Science at the University of Pennsylvania. Jorge volunteered in the GRASP Lab starting in 2011, and he joined Katherine J. Kuchenbecker’s Haptics Research Group in the summer of 2012. His research interests include haptic perception and computational memory.

    Michael Arrigo received the M.A. degree in Linguistics at the University of Pennsylvania in 2013, and his area of focus is computational linguistics. He received the B.A. degree in Cognitive Science and Linguistics from the University of Pennsylvania in 2013. In 2012, he joined Katherine J. Kuchenbecker’s Haptics Research Group in the GRASP Lab, where he gathered data on human haptic adjectives as a model for the PR2 robot behavior. His research interests include natural language processing and applying linguistic theory to language and speech technology.

    Trevor Darrell is head of the Computer Vision Group at the International Computer Science Institute and is on the faculty of the CS Division at UC Berkeley. His group develops algorithms to enable multimodal conversation with robots and mobile devices, and methods for object and activity recognition on such platforms. His interests include computer vision, machine learning, computer graphics, and perception-based human computer interfaces. Prof. Darrell was on the faculty of the MIT EECS department from 1999 to 2008, where he directed the Vision Interface Group. He was a member of the research staff at Interval Research Corporation from 1996 to 1999, and he received the S.M. and Ph.D. degrees from MIT in 1992 and 1996, respectively. He obtained the B.S.E. degree from the University of Pennsylvania in 1988, having started his career in computer vision as an undergraduate researcher in Ruzena Bajcsy’s GRASP lab.

    Katherine J. Kuchenbecker earned the B.S., M.S., and Ph.D. degrees in Mechanical Engineering from Stanford University in 2000, 2002, and 2006, respectively. She worked as a postdoctoral research associate at the Johns Hopkins University from 2006 to 2007. At present, she is an Associate Professor in Mechanical Engineering and Applied Mechanics at the University of Pennsylvania. Her research centers on the design and control of haptic interfaces, and she directs the Penn Haptics Group, which is part of the GRASP Laboratory. Prof. Kuchenbecker serves on the program committee for the IEEE Haptics Symposium and other conferences in the fields of haptics and robotics, and she has won several awards for her research, including an NSF CAREER award in 2009, inclusion in the Popular Science Brilliant 10 in 2010, and the IEEE Robotics and Automation Society Academic Early Career Award in 2012.

    1

    These authors contributed equally to this work.

    View full text