Abstract
The performance of visual perception algorithms for object category detection has largely been restricted by the lack of generalizability and scalability of state-of-art hand-crafted feature detectors and descriptors across instances of objects with different shapes, textures etc. The recently introduced deep learning algorithms have attempted at overcoming this limitation through automatic learning of feature kernels. Nevertheless, conventional deep learning architectures are uni-modal, essentially feedforward testing pipelines working on image space with little regard for context and semantics. In this paper, we address this issue by presenting a new framework for object categorization based on Deep Learning, called Parallel Deep Learning with Suggestive Activation (PDLSA) that imbibes several brain operating principles drawn from neuroscience and psychophysical studies. In particular, we focus on Suggestive Activation – a schema which enables feedback loops in the recognition process that use information obtained from partial detection results to generate hypotheses based on long-term memory (or knowledge base) to search in the image space for features corresponding to these hypotheses thereby enabling activation of the response corresponding to the correct object category through multi-modal integration. Results presented against a traditional SIFT based category classifier on the University of Washington benchmark RGB-D dataset demonstrates the validity of the approach.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)
Kim, S., Yoon, K.-J., Kweon, I.S.: Object recognition using a generalized robust invariant feature and Gestalt’s law of proximity and similarity. Pattern Recognition 41(2), 726–741 (2008)
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: Speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006)
Mikolajczyk, K., Schmid, C.: Scale & affine invariant interest point detectors. International Journal of Computer Vision 60(1), 63–86 (2004)
Forssen, P.-E., Lowe, D.G.: Shape descriptors for maximally stable extremal regions. In: IEEE 11th International Conference on Computer Vision, ICCV 2007. IEEE (2007)
Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 2. IEEE (2005)
Jarrett, K., et al.: What is the best multi-stage architecture for object recognition? In: 2009 IEEE 12th International Conference on Computer Vision. IEEE (2009)
Bengio, Y.: Learning deep architectures for AI. Foundations and Trends in Machine Learning 2(1), 1–127 (2009)
Cover, T.M., Thomas, J.A.: Elements of information theory. Wiley-Interscience (2006)
Field, D.J.: Relations between the statistics of natural images and the response properties of cortical cells. J. Opt. Soc. Am. A 4(12), 2379–2394 (1987)
Field, D.J.: What is the goal of sensory coding? Neural Computation 6(4) (1994)
Caruana, R.: Multitask learning. Machine Learning 28(1), 41–75 (1997)
Biederman, I.: Recognition-by-components: a theory of human image understanding. Psychological Review 94(2), 115 (1987)
Arbib, M.A. (ed.): The handbook of brain theory and neural networks. MIT Press
Rogers, T., et al.: Object recognition under semantic impairment: The effects of conceptual regularities on perceptual decisions. Language and Cognitive Processes 18(5-6), 625–662 (2003)
Varadarajan, K.M., Vincze, M.: AfNet: The affordance network. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part I. LNCS, vol. 7724, pp. 512–523. Springer, Heidelberg (2013)
Gibson, J.J.: The concept of affordances. Perceiving, Acting, and Knowing, 67–82 (1977)
Prinz, W.: Modes of linkage between perception and action. Cognition and motor processes, pp. 185–193. Springer, Heidelberg (1984)
Kohler, E., et al.: Hearing sounds, understanding actions: action representation in mirror neurons. Science 297(5582), 846–848 (2002)
Varadarajan, K.M.: k-TR: Karmic Tabula Rasa – A Theory of Visual Perception. In: Conference of the International Society of Psychophysics - ISP (2011)
Varadarajan, K.M., Vincze, M.: Knowledge representation and inference for grasp affordances. In: Crowley, J.L., Draper, B.A., Thonnat, M. (eds.) ICVS 2011. LNCS, vol. 6962, pp. 173–182. Springer, Heidelberg (2011)
Varadarajan, K.M., Vincze, M.: AfRob: The affordance network ontology for robots. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE (2012)
AfNet: The Affordance Network (2013), http://www.theaffordances.net
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Varadarajan, K.M., Vincze, M. (2013). Parallel Deep Learning with Suggestive Activation for Object Category Recognition. In: Chen, M., Leibe, B., Neumann, B. (eds) Computer Vision Systems. ICVS 2013. Lecture Notes in Computer Science, vol 7963. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39402-7_36
Download citation
DOI: https://doi.org/10.1007/978-3-642-39402-7_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39401-0
Online ISBN: 978-3-642-39402-7
eBook Packages: Computer ScienceComputer Science (R0)