Abstract
Object affordance detection, which aims to understand functional attributes of objects, is of great significance for an autonomous robot to achieve a humanoid object manipulation. In this paper, we propose a novel relationship-aware convolutional neural network, which takes the symbiotic relationship between multiple affordances and the combinational relationship between the affordance and objectness into consideration, to predict the most probable affordance label for each pixel in the object. Different from the existing CNN-based methods that rely on separate and intermediate object detection step, our proposed network directly produces the pixel-wise affordance maps from an input image in an end-to-end manner. Specifically, there are three key components in our proposed network: Coord-ASPP module introducing CoordConv in atrous spatial pyramid pooling (ASPP) to refine the feature maps, relationship-aware module linking the affordances and corresponding objects to explore the relationships, and online sequential extreme learning machine auxiliary attention module focusing on individual affordances further to assist relationship-aware module. The experimental results on two public datasets have shown the merits of each module and demonstrated the superiority of our relationship-aware network against the state of the arts.
Similar content being viewed by others
References
Gibson JJ (1977) The theory of affordances. Hilldale, USA
Myers A, Teo CL, Fermuller C, Aloimonos Y (2015) Affordance detection of tool parts from geometric features. In: IEEE conference on robotics and automation (ICRA), pp 1374–1381
Do T, Nguyen AT, Reid ID (2018) AffordanceNet: an end-to-end deep learning approach for object affordance detection. In: IEEE conference on robotics and automation (ICRA), pp 1–5
Nguyen A, Kanoulas D, Caldwell DG, Tsagarakis NG (2016) Detecting object affordances with convolutional neural networks. In: IEEE/RSJ conference on intelligent robots and systems (IROS), pp 2765-2770
Roy A, Todorovic S (2016) A multi-scale CNN for affordance segmentation in RGB images. In: The European conference on computer vision (ECCV), pp 186–201
Trung TP, Thanh-Toan D, Niko S, Ian R (2018) Scenecut: joint geometric and object segmentation for indoor scenes. In: IEEE conference on robotics and automation (ICRA), pp 1–9
Tucker H, James MR, Aaron B (2011) Affordance prediction via learned object attributes. In: IEEE conference on robotics and automation workshops
Hedvig K, Romero J, Danica K (2011) Visual object-action recognition: inferring object affordances from human demonstration. Comput Vis Image Underst 115(1):81–90
Montesano L, Lopes M, Bernardino A, Santosvictor J (2008) Learning object affordances: from sensory-motor coordination to imitation. IEEE Trans Rob 24(1):15–26
Schoeler M, Wörgötter F (2016) Bootstrapping the semantics of tools: affordance analysis of real world objects on a per-part basis. IEEE Trans Cognit Dev Syst 8(2):84–98
Lenz I, Lee H, Saxena A (2013) Deep learning for detecting robotic grasps. Int J Robot Res 34(4–5):705–724
Levine S, Pastor P, Krizhevsky A, Quillen D (2016) Learning hand-eye coordination for robotic grasping with large-scale data collection. In: International symposium on experimental robotics, pp 173–184
Sawatzky J, Srikantha A, Gall J. (2017) Weakly supervised affordance detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 5197–5206
Koppula HS, Gupta R, Saxena A (2013) Learning human activities and object affordances from RGB-D videos. Int J Robot Res 32(8):951–970
Nguyen A, Kanoulas D, Caldwell DG, Tsagarakis NG (2017) Object-based affordances detection with convolutional neural networks and dense conditional random fields. In: Intelligent robots and systems, pp 5908–5915
Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436
Huang GB et al (2012) Extreme Learning Machine for Regression and Multiclass Classification. IEEE Trans Syst Man Cybern—Part B: Cybern 42(2):513–529
Zhang L, Zhang D, Tian FC (2016) SVM and ELM: who wins? object recognition with deep convolutional features from imagenet. Proc ELM 2015 1:249–263
Duan MX, Li KL, Li KQ (2018) An Ensemble CNN2ELM for Age Estimation. IEEE Trans Inf Forensics Secur 13(3):758–772
Chang P, Zhang J, Hu J, Song Z (2018) A deep neural network based on ELM for semi-supervised learning of image classification. Neural Process Lett 48(1):375–388
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2016) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Liu R, Lehman J, Molino P, Such FP, Frank E, Sergeev A, Yosinski J (2018) An intriguing failing of convolutional neural networks and the CoordConv solution. In: Advances in neural information processing systems, pp 9605–9616
Luong T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Empirical methods in natural language processing, pp 1412–1421
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501
Liang NY, Huang GB, Saratchandran P, Sundararajan N (2006) A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Trans Neural Networks 17(6):1411–1423
Huynh HT, Won Y (2011) Regularized online sequential learning algorithm for single-hidden layer feedforward neural networks. Pattern Recogn Lett 32(14):1930–1935
Zhang H, Dana K, Shi J, Zhang Z, Wang X, Tyagi A, et al (2018) Context encoding for semantic segmentation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 7151–7160
Ran M, Zelnikmanor L, Tal A (2014) How to evaluate foreground maps. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 248–255
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778
Acknowledgements
This work was supported by National Key R&D Program of China under Grant 2017YFB130092, National Natural Science Foundation of China (NSFC) under Grants 61872327 and 61472380 as well as the Fundamental Research Funds for the Central Universities under Grant WK2380000001.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflicts of interest to this work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhao, X., Cao, Y. & Kang, Y. Object affordance detection with relationship-aware network. Neural Comput & Applic 32, 14321–14333 (2020). https://doi.org/10.1007/s00521-019-04336-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-019-04336-0