Abstract
Objects in human environments support various functionalities which govern how people interact with their environments in order to perform tasks. In this work, we discuss how to represent and learn a functional understanding of an environment in terms of object affordances. Such an understanding is useful for many applications such as activity detection and assistive robotics. Starting with a semantic notion of affordances, we present a generative model that takes a given environment and human intention into account, and grounds the affordances in the form of spatial locations on the object and temporal trajectories in the 3D environment. The probabilistic model also allows uncertainties and variations in the grounded affordances. We apply our approach on RGB-D videos from Cornell Activity Dataset, where we first show that we can successfully ground the affordances, and we then show that learning such affordances improves performance in the labeling tasks.
Chapter PDF
Similar content being viewed by others
Keywords
References
Aksoy, E.E., Abramov, A., Dörr, J., Ning, K., Dellen, B., Wörgötter, F.: Learning the semantics of object-action relations by observation. IJRR 30(10), 1229–1249 (2011)
Aldoma, A., Tombari, F., Vincze, M.: Supervised learning of hidden and non-hidden 0-order affordances and detection in real scenes. In: ICRA (2012)
Anand, A., Koppula, H., Joachims, T., Saxena, A.: Contextually guided semantic labeling and search for 3d point clouds. IJRR (2012)
Borghi, A.: Object concepts and action: Extracting affordances from objects parts. In: Acta Psyhologica (2004)
Delaitre, V., Fouhey, D.F., Laptev, I., Sivic, J., Gupta, A., Efros, A.A.: Scene semantics from long-term observation of people. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 284–298. Springer, Heidelberg (2012)
Faraway, J., Reed, M., Wang, J.: Modeling three-dimensional trajectories by using bezier curves with application to hand motion. J. Royal Stats. Soc. Series C-Applied Statistics 56 (2007)
Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: CVPR (2009)
Fisher, M., Savva, M., Hanrahan, P.: Characterizing structural relationships in scenes using graph kernels. In: SIGGRAPH (2011)
Gibson, J.J.: The ecological approach to visual perception. Houghton Mifflin (1979)
Grabner, H., Gall, J., Van Gool, L.: What makes a chair a chair? In: CVPR (2011)
Gupta, A., Davis, L.S.: Objects in action: An approach for combining action understanding and object perception. In: CVPR (2007)
Gupta, A., Satkin, S., Efros, A.A., Hebert, M.: From 3d scene geometry to human workspace. In: CVPR (2011)
Hermans, T., Rehg, J.M., Bobick, A.: Decoupling behavior, perception, and control for autonomous learning of affordances. In: ICRA (2013)
Hermans, T., Rehg, J.M., Bobick, A.: Affordance prediction via learned object attributes. In: ICRA: Workshop on Semantic Perception, Mapping, and Exploration (2011)
Jain, A., Wojcik, B., Joachims, T., Saxena, A.: Learning trajectory preferences for manipulators via iterative improvement. In: Neural Information Processing Systems, NIPS (2013)
Jiang, Y., Koppula, H., Saxena, A.: Hallucinated humans as the hidden context for labeling 3d scenes. In: CVPR (2013)
Jiang, Y., Lim, M., Saxena, A.: Learning object arrangements in 3d scenes using human context. In: ICML (2012)
Jiang, Y., Saxena, A.: Modeling high-dimensional humans for activity anticipation using gaussian process latent crfs. In: Robotics: Science and Systems, RSS (2014)
Kjellstrom, H., Romero, J., Kragic, D.: Visual object-action recognition: Inferring object affordances from human demonstration. In: CVIU (2011)
Konidaris, G., Kuindersma, S., Grupen, R., Barto, A.: Robot learning from demonstration by constructing skill trees. IJRR 31 (2012)
Koppula, H., Jain, A., Saxena, A.: Anticipatory planning for human-robot teams. ISER (2014)
Koppula, H., Saxena, A.: Anticipating human activities using object affordances for reactive robotic response. In: RSS (2013)
Koppula, H.S., Gupta, R., Saxena, A.: Learning human activities and object affordances from rgb-d videos. IJRR 32(8) (2013)
Koppula, H.S., Saxena, A.: Learning spatio-temporal structure from rgb-d videos for human activity detection and anticipation. In: ICML (2013)
Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In: CVPR (2011)
Lopes, M., Santos-Victor, J.: Visual learning by imitation with motor representations. IEEE Trans. Systems, Man, and Cybernetics, Part B: Cybernetics 35(3), 438–449 (2005)
Lorken, C., Hertzberg, J.: Grounding planning operators by affordances. In: Int’l Conf. Cog. Sys. (2008)
McCandless, T., Grauman, K.: Object-centric spatio-temporal pyramids for egocentric activity recognition. In: British Machine Vision Conference, BMVC (2013)
Misra, D.K., Sung, J., Lee, K., Saxena, A.: Tell me dave: Context-sensitive grounding of natural language to mobile manipulation instructions. In: Robotics: Science and Systems, RSS (2014)
Montesano, L., Lopes, M., Bernardino, A., Santos-Victor, J.: Learning object affordances: from sensory–motor coordination to imitation. IEEE Trans. Robotics 24(1), 15–26 (2008)
Montesano, L., Lopes, M., Bernardino, A., Santos-Victor, J.: Learning object affordances: From sensory–motor coordination to imitation. IEEE Trans. Robotics 24(1), 15–26 (2008)
Neisser, U.: Cognition and Reality: Principles and Implications of Cognitive Psychology. W. H. Freeman (1976)
Norman: The Psychology of Everyday Things. Basic Books (1988)
Pandey, A.K., Alami, R.: Mightability maps: A perceptual level decisional framework for co-operative and competitive human-robot interaction. In: IROS (2010)
Ridge, B., Skočaj, D., Leonardis, A.: Unsupervised learning of basic object affordances from object properties. In: Proc. 14th Comp. Vision Winter Work, CVWW (2009)
Russakovsky, O., Fei-Fei, L.: Attribute learning in large-scale datasets. In: ECCV Int’l Work. Parts & Attributes (2010)
Sahin, E., Cakmak, M., Dogar, M.R., Ugur, E., Ucoluk, G.: To afford or not to afford: A new formalization of affordances toward affordance-based robot control. Adaptive Behavior 15(4) (2007)
Stark, M., Lies, P., Zillich, M., Wyatt, J.C., Schiele, B.: Functional object class detection based on learned affordance cues. In: Gasteratos, A., Vincze, M., Tsotsos, J.K. (eds.) ICVS 2008. LNCS, vol. 5008, pp. 435–444. Springer, Heidelberg (2008)
Sun, J., Moore, J.L., Bobick, A., Rehg, J.M.: Learning visual object categories for robot affordance prediction. IJRR (2009)
Sung, J., Ponce, C., Selman, B., Saxena, A.: Unstructured human activity detection from rgbd images. In: ICRA (2012)
Ugur, E., Sachin, E., Oztop, E.: Affordance learning from range data for multi-step planning. In: Epirob (2009)
Wu, C., Lenz, I., Saxena, A.: Hierarchical semantic labeling for task-relevant rgb-d perception. In: RSS (2014)
Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: CVPR (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Koppula, H.S., Saxena, A. (2014). Physically Grounded Spatio-temporal Object Affordances. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8691. Springer, Cham. https://doi.org/10.1007/978-3-319-10578-9_54
Download citation
DOI: https://doi.org/10.1007/978-3-319-10578-9_54
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10577-2
Online ISBN: 978-3-319-10578-9
eBook Packages: Computer ScienceComputer Science (R0)