Skip to main content
Log in

VFM: Visual Feedback Model for Robust Object Recognition

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Object recognition, which consists of classification and detection, has two important attributes for robustness: 1) closeness: detection windows should be as close to object locations as possible, and 2) adaptiveness: object matching should be adaptive to object variations within an object class. It is difficult to satisfy both attributes using traditional methods which consider classification and detection separately; thus recent studies propose to combine them based on confidence contextualization and foreground modeling. However, these combinations neglect feature saliency and object structure, and biological evidence suggests that the feature saliency and object structure can be important in guiding the recognition from low level to high level. In fact, object recognition originates in the mechanism of “what” and “where” pathways in human visual systems. More importantly, these pathways have feedback to each other and exchange useful information, which may improve closeness and adaptiveness. Inspired by the visual feedback, we propose a robust object recognition framework by designing a computational visual feedback model (VFM) between classification and detection. In the “what” feedback, the feature saliency from classification is exploited to rectify detection windows for better closeness; while in the “where” feedback, object parts from detection are used to match object structure for better adaptiveness. Experimental results show that the “what” and “where” feedback is effective to improve closeness and adaptiveness for object recognition, and encouraging improvements are obtained on the challenging PASCAL VOC 2007 dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Everingham M, Van Gool L, Williams C K I, Winn J, Zisserman A. The PASCAL Visual Object Classes (VOC) challenge. International Journal of Computer Vision, 2010, 88(2): 303-338.

    Article  Google Scholar 

  2. Deng J, Dong W, Socher R, Li L J, Li K, Li F F. ImageNET: A large-scale hierarchical image database. In Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, June 2009, pp.248-255.

  3. Csurka G, Dance C R , Fan L, Willamowski J, Bray C. Visual categorization with bags of keypoints. In Proc. European Conference on Computer Vision Workshop, May 2004, pp.145-168.

  4. Yang J, Yu K, Gong Y, Huang T. Linear spatial pyramid matching using sparse coding for image classification. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2009, pp.1794-1801.

  5. Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y. Localityconstrained linear coding for image classification. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2010, pp.3360-3367.

  6. Zhou X, Yu K, Zhang T, Huang T. Image classification using super-vector coding of local image descriptors. In Proc. the 11th European Conference on Computer Vision, September 2010, pp.141-154.

  7. Perronnin F, S´anchez J, Mensink T. Improving the fisher kernel for large-scale image classification. In Proc. the 11th European Conference on Computer Vision, September 2010, pp.143-156.

  8. Krizhevsky A, Sutskever I, Hinton G E. ImageNET classification with deep convolutional neural networks. In Proc. the 26th Annual Conf. Neural Information Processing Systems, December 2012, pp.1106-1114.

  9. Chatfield K, Simonyan K, Vedaldi A, Zisserman A. Return of the devil in the details: Delving deep into convolutional nets. arXiv:1405.3531, 2014.

  10. Lin M, Chen Q, Yan S. Network in network. arXiv:1312.4400, 2014.

  11. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014.

  12. Zeiler M D, Fergus R. Visualizing and understanding convolutional networks. In Proc. the 13th European Conference on Computer Vision, September 2014, pp.818-833.

  13. Felzenszwalb P F, Girshick R B, McAllester D, Ramanan D. Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9): 1627-1645.

    Article  Google Scholar 

  14. Wang X, Bai X, Ma T, Liu W, Latecki L. Fan shape model for object detection. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2012, pp.151-158.

  15. Zhu L, Chen Y, Yuille A, Freeman W. Latent hierarchical structural learning for object detection. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2010, pp.1062-1069.

  16. Girshick R B, Felzenszwalb P F, McAllester D A. Object detection with grammar models. In Proc. the 25th NIPS, December 2011, pp.442-450.

  17. Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2014, pp.580-587.

  18. Hoffman J, Guadarrama S, Tzeng E, Hu R, Donahue J, Girshick R, Darrell T, Saenko K. LSDA: Large scale detection through adaptation. In Proc. NIPS, December 2014, pp.3536-3544.

  19. Zhang N, Donahue J, Girshick R, Darrell T. Part-based R-CNNs for fine-grained category detection. In Proc. the 13th European Conference on Computer Vision, September 2014, pp.834-849.

  20. Gupta S, Girshick R, Arbeláez P, Malik J. Learning rich features from RGB-D images for object detection and segmentation. In Proc. the 13th European Conference on Computer Vision, September 2014, pp.345-360.

  21. Hariharan B, Arbeláez P, Girshick R, Malik J. Simultaneous detection and segmentation. In Proc. the 13th European Conference on Computer Vision, September 2014, pp.297-312.

  22. Zhang J, Zhao X, Huang Y, Huang K, Tan T. Semantic windows mining in sliding window based object detection. In Proc. the 21st International Conference on Pattern Recognition, November 2012, pp.3264-3267.

  23. Russakovsky O, Lin Y, Yu K, Li F F. Object-centric spatial pooling for image classification. In Proc. the 12th European Conference on Computer Vision, Oct. 2012, pp.1-15.

  24. Lazebnik S, Schmid C, Ponce J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, June 2006, pp.2169-2178.

  25. Chikkerur S, Serre T, Tan C, Poggio T. What and where: A Bayesian inference theory of attention. Vision Research, 2010, 50(22): 2233-2247.

    Article  Google Scholar 

  26. Galleguillos C, Belongie S. Context based object categorization: A critical survey. Computer Vision and Image Understanding, 2010, 114(6): 712-722.

    Article  Google Scholar 

  27. Divvala S K, Hoiem D, Hays J H, Efros A A, Hebert M. An empirical study of context in object detection. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2009, pp.1271-1278.

  28. Harzallah H, Jurie F, Schmid C. Combining efficient object localization and image classification. In Proc. the 12th International Conference on Computer Vision, Sept. 29-Oct. 2, 2009, pp.237-244.

  29. Song Z, Chen Q, Huang Z, Hua Y, Yan S. Contextualizing object detection and classification. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2011, pp.1585-1592.

  30. Chen G, Ding Y, Xiao J, Han T X. Detection evolution with multi-order contextual co-occurrence. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2013, pp.1798-1805.

  31. Zhang Y, Chen T. Weakly supervised object recognition and localization with invariant high order features. In Proc. the British Machine Vision Conference, Aug. 31-Sept. 3, 2010, pp.47:1-47:11.

  32. Chen Q, Song Z, Hua Y, Huang Z, Yan S. Hierarchical matching with side information for image classification. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2013, pp.3426-3433.

  33. Nguyen M H, Torresani L, de la Torre F, Rother C. Weakly supervised discriminative localization and classification: A joint learning process. In Proc. International Conference on Computer Vision, September 2009, pp.1925-1932.

  34. Huang Y, Huang K, Yu Y, Tan T. Salient coding for image classification. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2011, pp.1753-1760.

  35. Rybak I A, Gusakova V I, Golovan A V, Podladchikova L N, Shevtsova N A. A model of attention-guided visual perception and recognition. Vision Research, 1998, 38(15/16): 2387-2400.

    Article  Google Scholar 

  36. Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(11): 1254-1259.

    Article  Google Scholar 

  37. Barenholtz E, Tarr M J. Reconsidering the role of structure in vision. The Psychology of Learning and Motivation, 2006, 47:157-180.

    Article  Google Scholar 

  38. Biederman I. Recognition-by-components: A theory of human image understanding. Psychological Review, 1987, 94(2):115-147.

    Article  Google Scholar 

  39. Huang K,Wang Q,Wu Z. Natural color image enhancement and evaluation algorithm based on human visual system. Computer Vision and Image Understanding, 2006, 103(1): 52-63.

    Article  Google Scholar 

  40. Huang K, Wu Z, Wang Q. Image enhancement based on the statistics of visual representation. Image and Vision Computing, 2005, 23(1): 51-57.

    Article  Google Scholar 

  41. Huang K, Wu Z, Fung G S K, Chan F H Y. Color image denoising with wavelet thresholding based on human visual system model. Signal Processing: Image Communication, 2005, 20(2): 115-127.

    Google Scholar 

  42. Boureau Y, Ponce J, LeCun Y. A theoretical analysis of feature pooling in visual recognition. In Proc. the 27th International Conference on Machine Learning, June 2010, pp.111-118.

  43. Serre T, Wolf L, Poggio T. Object recognition with features inspired by visual cortex. In Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, June 2005, pp.994-1000.

  44. Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786): 504-507.

    Article  MATH  MathSciNet  Google Scholar 

  45. LeCun Y, Kavukvuoglu K, Farabet C. Convolutional networks and applications in vision. In Proc. IEEE International Symposium on Circuits and Systems, May 30-June 2, 2010, pp.253-256.

  46. Dalal N, Triggs B. Histograms of oriented gradients for human detection. In Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, June 2005, pp.886-893.

  47. Wohlhart P, Donoser M, Roth P M, Bischof H. Detecting partially occluded objects with an implicit shape model random field. In Proc. the 11th Asian Conference on Computer Vision, November 2012, pp.302-315.

  48. Bogacz R, Usher M, Zhang J, McClelland J L. Extending a biologically inspired model of choice: Multialternatives, nonlinearity and value-based multidimensional choice. Philosophical Transactions of The Royal Society of London, Series B, Biological Sciences, 2007, 362(1485): 1655-1670.

    Article  Google Scholar 

  49. Yang J, Yu K, Huang T. Supervised translation invariant sparse coding. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2010, pp.3517-3524.

  50. Jurie F , Triggs B. Creating efficient codebooks for visual recognition. In Proc. the 10th International Conference on Computer Vision, Oct. 2005, pp.604-610.

  51. Boureau Y L, Bach F, LeCun Y, Ponce J. Learning midlevel features for recognition. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2010, pp.2559-2566.

  52. Van Gemert J C, Veenman C J, Smeulders A W M, Geusebroek J M. Visual word ambiguity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(7): 1271-1283.

    Article  Google Scholar 

  53. Jegou H, Perronnin F, Douze M, Sanchez J, Perez P, Schmid C. Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(9): 1704-1716.

    Article  Google Scholar 

  54. Chatfield K, Lempitsky V, Vedaldi A, Zisserman A. The devil is in the details: An evaluation of recent feature encoding methods. In Proc. the 22nd British Machine Vision Conference, Aug. 29-Sept. 22, 2011, pp.76:1-76:12.

  55. Felzenszwalb P F, Huttenlocher D P. Pictorial structures for object recognition. International Journal of Computer Vision, 2005, 61(1): 55-79.

    Article  Google Scholar 

  56. Desai C, Ramanan D, Fowlkes C C. Discriminative models for multi-class object layout. International Journal of Computer Vision, 2011, 95(1): 1-12.

    Article  MATH  MathSciNet  Google Scholar 

  57. Vedaldi A, Gulshan V, Varma M, Zisserman A. Multiple kernels for object detection. In Proc. the 12th IEEE International Conference on Computer Vision, Sept. 29-Oct. 2, 2009, pp.606-613.

  58. Pepik B, Stark M, Gehler P, Schiele B. Teaching 3D geometry to deformable part models. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2012, pp.3362-3369.

  59. Yang Y, Ramanan D. Articulated pose estimation using flexible mixtures of parts. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2011, pp.1385-1392.

  60. Zhu X, Ramanan D. Face detection pose estimation landmark localization in the wild. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2012, pp.2879-2886.

  61. Duchenne O, Joulin A, Ponce J. A graph-matching kernel for object categorization. In Proc. IEEE International Conference on Computer Vision, November 2011, pp.1792-1799.

  62. Song X, Wu T, Jia Y, Zhu S. Discriminatively trained and-or tree models for object detection. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2013, pp.3278-3285.

  63. Carbonetto P, de Freitas N, Barnard K. A statistical model for general contextual object recognition. In Proc. the 8th European Conference on Computer Vision, May 2004, pp.350-362.

  64. Kosslyn S M, Flynn R A, Amsterdam J B,Wang G. Components of high-level vision: A cognitive neuroscience analysis and accounts of neurological syndromes. Cognition, 1990, 34(3): 203-277.

    Article  Google Scholar 

  65. Mishkin M, Ungerleider L G, Macko K A. Object vision and spatial vision: Two cortial pathways. Trends in Neurosciences, 1983, 6: 414-417.

    Article  Google Scholar 

  66. Ungerleider L G, Mishkin M. Two Cortical Visual Systems. Cambridge, MA: MIT Press, 1982.

    Google Scholar 

  67. Chai Y, Lempitsky V, Zisserman A. BiCoS: A bi-level co-segmentation method for image classification. In Proc. IEEE International Conference on Computer Vision, November 2011, pp.2579-2586.

  68. Crandall D J, Huttenlocher D P. Weakly supervised learning of part-based spatial models for visual object recognition. In Proc. the 9th European Conference on Computer Vision, May 2006, pp.16-29.

  69. Ren X, Ramanan D. Histograms of sparse codes for object detection. In Proc. Computer Vision and Pattern Recognition, June 2013, pp.3246-3253.

  70. Malisiewicz T, Efros A A. Improving spatial support for objects via multiple segmentations. In Proc. the British Machine Vision Conference, September 2007, pp.55:1-55:10.

  71. Pandey M, Lazebnik S. Scene recognition and weakly supervised object localization with deformable part-based models. In Proc. IEEE International Conference on Computer Vision, November 2011, pp.1307-1314.

  72. Lowe D G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2): 91-110.

    Article  Google Scholar 

  73. Zhang J, Huang K, Yu Y, Tan T. Boosted local structured HOG-LBp for object localization. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2011, pp.1393-1400.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kai-Qi Huang.

Additional information

Special Section on Object Recognition

This work was supported by the National Basic Research 973 Program of China under Grant No. 2012CB316302, the National Natural Science Foundation of China under Grant Nos. 61322209 and 61175007, the National Key Technology Research and Development Program of China under Grant No. 2012BAH07B01.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, C., Huang, KQ. VFM: Visual Feedback Model for Robust Object Recognition. J. Comput. Sci. Technol. 30, 325–339 (2015). https://doi.org/10.1007/s11390-015-1526-1

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-015-1526-1

Keywords

Navigation