Physically Grounded Spatio-temporal Object Affordances

Koppula, Hema S.; Saxena, Ashutosh

doi:10.1007/978-3-319-10578-9_54

Hema S. Koppula¹⁹ &
Ashutosh Saxena¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8691))

Included in the following conference series:

European Conference on Computer Vision

18k Accesses
17 Citations

Abstract

Objects in human environments support various functionalities which govern how people interact with their environments in order to perform tasks. In this work, we discuss how to represent and learn a functional understanding of an environment in terms of object affordances. Such an understanding is useful for many applications such as activity detection and assistive robotics. Starting with a semantic notion of affordances, we present a generative model that takes a given environment and human intention into account, and grounds the affordances in the form of spatial locations on the object and temporal trajectories in the 3D environment. The probabilistic model also allows uncertainties and variations in the grounded affordances. We apply our approach on RGB-D videos from Cornell Activity Dataset, where we first show that we can successfully ground the affordances, and we then show that learning such affordances improves performance in the labeling tasks.

Download to read the full chapter text

Chapter PDF

Predicting Human Actions Taking into Account Object Affordances

Article Open access 04 April 2018

Vibekananda Dutta & Teresa Zielinska

One-Shot Learning for Human Affordance Detection

Inferring Semantic Object Affordances from Videos

Keywords

References

Aksoy, E.E., Abramov, A., Dörr, J., Ning, K., Dellen, B., Wörgötter, F.: Learning the semantics of object-action relations by observation. IJRR 30(10), 1229–1249 (2011)
Google Scholar
Aldoma, A., Tombari, F., Vincze, M.: Supervised learning of hidden and non-hidden 0-order affordances and detection in real scenes. In: ICRA (2012)
Google Scholar
Anand, A., Koppula, H., Joachims, T., Saxena, A.: Contextually guided semantic labeling and search for 3d point clouds. IJRR (2012)
Google Scholar
Borghi, A.: Object concepts and action: Extracting affordances from objects parts. In: Acta Psyhologica (2004)
Google Scholar
Delaitre, V., Fouhey, D.F., Laptev, I., Sivic, J., Gupta, A., Efros, A.A.: Scene semantics from long-term observation of people. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 284–298. Springer, Heidelberg (2012)
Chapter Google Scholar
Faraway, J., Reed, M., Wang, J.: Modeling three-dimensional trajectories by using bezier curves with application to hand motion. J. Royal Stats. Soc. Series C-Applied Statistics 56 (2007)
Google Scholar
Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: CVPR (2009)
Google Scholar
Fisher, M., Savva, M., Hanrahan, P.: Characterizing structural relationships in scenes using graph kernels. In: SIGGRAPH (2011)
Google Scholar
Gibson, J.J.: The ecological approach to visual perception. Houghton Mifflin (1979)
Google Scholar
Grabner, H., Gall, J., Van Gool, L.: What makes a chair a chair? In: CVPR (2011)
Google Scholar
Gupta, A., Davis, L.S.: Objects in action: An approach for combining action understanding and object perception. In: CVPR (2007)
Google Scholar
Gupta, A., Satkin, S., Efros, A.A., Hebert, M.: From 3d scene geometry to human workspace. In: CVPR (2011)
Google Scholar
Hermans, T., Rehg, J.M., Bobick, A.: Decoupling behavior, perception, and control for autonomous learning of affordances. In: ICRA (2013)
Google Scholar
Hermans, T., Rehg, J.M., Bobick, A.: Affordance prediction via learned object attributes. In: ICRA: Workshop on Semantic Perception, Mapping, and Exploration (2011)
Google Scholar
Jain, A., Wojcik, B., Joachims, T., Saxena, A.: Learning trajectory preferences for manipulators via iterative improvement. In: Neural Information Processing Systems, NIPS (2013)
Google Scholar
Jiang, Y., Koppula, H., Saxena, A.: Hallucinated humans as the hidden context for labeling 3d scenes. In: CVPR (2013)
Google Scholar
Jiang, Y., Lim, M., Saxena, A.: Learning object arrangements in 3d scenes using human context. In: ICML (2012)
Google Scholar
Jiang, Y., Saxena, A.: Modeling high-dimensional humans for activity anticipation using gaussian process latent crfs. In: Robotics: Science and Systems, RSS (2014)
Google Scholar
Kjellstrom, H., Romero, J., Kragic, D.: Visual object-action recognition: Inferring object affordances from human demonstration. In: CVIU (2011)
Google Scholar
Konidaris, G., Kuindersma, S., Grupen, R., Barto, A.: Robot learning from demonstration by constructing skill trees. IJRR 31 (2012)
Google Scholar
Koppula, H., Jain, A., Saxena, A.: Anticipatory planning for human-robot teams. ISER (2014)
Google Scholar
Koppula, H., Saxena, A.: Anticipating human activities using object affordances for reactive robotic response. In: RSS (2013)
Google Scholar
Koppula, H.S., Gupta, R., Saxena, A.: Learning human activities and object affordances from rgb-d videos. IJRR 32(8) (2013)
Google Scholar
Koppula, H.S., Saxena, A.: Learning spatio-temporal structure from rgb-d videos for human activity detection and anticipation. In: ICML (2013)
Google Scholar
Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In: CVPR (2011)
Google Scholar
Lopes, M., Santos-Victor, J.: Visual learning by imitation with motor representations. IEEE Trans. Systems, Man, and Cybernetics, Part B: Cybernetics 35(3), 438–449 (2005)
Article Google Scholar
Lorken, C., Hertzberg, J.: Grounding planning operators by affordances. In: Int’l Conf. Cog. Sys. (2008)
Google Scholar
McCandless, T., Grauman, K.: Object-centric spatio-temporal pyramids for egocentric activity recognition. In: British Machine Vision Conference, BMVC (2013)
Google Scholar
Misra, D.K., Sung, J., Lee, K., Saxena, A.: Tell me dave: Context-sensitive grounding of natural language to mobile manipulation instructions. In: Robotics: Science and Systems, RSS (2014)
Google Scholar
Montesano, L., Lopes, M., Bernardino, A., Santos-Victor, J.: Learning object affordances: from sensory–motor coordination to imitation. IEEE Trans. Robotics 24(1), 15–26 (2008)
Article Google Scholar
Montesano, L., Lopes, M., Bernardino, A., Santos-Victor, J.: Learning object affordances: From sensory–motor coordination to imitation. IEEE Trans. Robotics 24(1), 15–26 (2008)
Article Google Scholar
Neisser, U.: Cognition and Reality: Principles and Implications of Cognitive Psychology. W. H. Freeman (1976)
Google Scholar
Norman: The Psychology of Everyday Things. Basic Books (1988)
Google Scholar
Pandey, A.K., Alami, R.: Mightability maps: A perceptual level decisional framework for co-operative and competitive human-robot interaction. In: IROS (2010)
Google Scholar
Ridge, B., Skočaj, D., Leonardis, A.: Unsupervised learning of basic object affordances from object properties. In: Proc. 14th Comp. Vision Winter Work, CVWW (2009)
Google Scholar
Russakovsky, O., Fei-Fei, L.: Attribute learning in large-scale datasets. In: ECCV Int’l Work. Parts & Attributes (2010)
Google Scholar
Sahin, E., Cakmak, M., Dogar, M.R., Ugur, E., Ucoluk, G.: To afford or not to afford: A new formalization of affordances toward affordance-based robot control. Adaptive Behavior 15(4) (2007)
Google Scholar
Stark, M., Lies, P., Zillich, M., Wyatt, J.C., Schiele, B.: Functional object class detection based on learned affordance cues. In: Gasteratos, A., Vincze, M., Tsotsos, J.K. (eds.) ICVS 2008. LNCS, vol. 5008, pp. 435–444. Springer, Heidelberg (2008)
Chapter Google Scholar
Sun, J., Moore, J.L., Bobick, A., Rehg, J.M.: Learning visual object categories for robot affordance prediction. IJRR (2009)
Google Scholar
Sung, J., Ponce, C., Selman, B., Saxena, A.: Unstructured human activity detection from rgbd images. In: ICRA (2012)
Google Scholar
Ugur, E., Sachin, E., Oztop, E.: Affordance learning from range data for multi-step planning. In: Epirob (2009)
Google Scholar
Wu, C., Lenz, I., Saxena, A.: Hierarchical semantic labeling for task-relevant rgb-d perception. In: RSS (2014)
Google Scholar
Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: CVPR (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Cornell University, USA
Hema S. Koppula & Ashutosh Saxena

Authors

Hema S. Koppula
View author publications
You can also search for this author in PubMed Google Scholar
Ashutosh Saxena
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Toront, 6 King’s College Road, M5H 3S5, Toronto, ON, Canada
David Fleet
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, Technicka 2, 166 27, Prague 6, Czech Republic
Tomas Pajdla
Max-Planck-Institut für Informatik, Campus E1 4, 66123, Saarbrücken, Germany
Bernt Schiele
ESAT - PSI, iMinds, KU Leuven, Kasteelpark Arenberg 10, Bus 2441, 3001, Leuven, Belgium
Tinne Tuytelaars

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Koppula, H.S., Saxena, A. (2014). Physically Grounded Spatio-temporal Object Affordances. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8691. Springer, Cham. https://doi.org/10.1007/978-3-319-10578-9_54

Download citation

DOI: https://doi.org/10.1007/978-3-319-10578-9_54
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10577-2
Online ISBN: 978-3-319-10578-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Physically Grounded Spatio-temporal Object Affordances

Abstract

Chapter PDF

Similar content being viewed by others

Predicting Human Actions Taking into Account Object Affordances

One-Shot Learning for Human Affordance Detection

Inferring Semantic Object Affordances from Videos

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Abstract

Chapter PDF

Similar content being viewed by others

Predicting Human Actions Taking into Account Object Affordances

One-Shot Learning for Human Affordance Detection

Inferring Semantic Object Affordances from Videos

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation