Skip to main content
Log in

End-to-End Trained Sparse Coding Network with Spatial Pyramid Pooling for Image Classification

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Spatial pyramid matching using sparse coding (ScSPM) has become an efficient method and a benchmark in image classification. However, since it is unsupervised, the trained dictionary may be suboptimal. To further improve classification accuracy, in this paper we propose a sparse coding network with spatial pyramid pooling based on the end-to-end deep learning approach. In our new system, the minimization problem in sparse coding can be modeled as a feed-forward neural network and image features can be extracted by the deep convolutional network. By minimizing the final classifier loss using the end-to-end deep learning method, the sparse coding network can be trained in a supervised way. Our proposed model is tested on three image databases and in terms of classification accuracy, it significantly outperforms ScSPM. Compared with other image classification approaches based on deep learning, it can also achieve a noticeable improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Zhao H, Luo J, Huang Z, Nagumo T, Murayama J, Zhang L (2015) Statistically adaptive image denoising based on overcomplete topographic sparse coding. Neural Process Lett 41(3):1–13

    Article  Google Scholar 

  2. Akhtar N, Shafait F, Mian A (2015) Bayesian sparse representation for hyperspectral image super resolution. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 3631–3640

  3. Yang J, Wright J, Huang TS, Ma Y (2010) Image super-resolution via sparse representation. IEEE Trans Image Process 19(11):2861–2873

    Article  MathSciNet  Google Scholar 

  4. Wang Z, Liu D, Yang J, Han W, Huang T (2015) Deep networks for image super-resolution with sparse prior. In: 2015 IEEE international conference on computer vision (ICCV), pp 370–378

  5. Ma Z, Xiang Z (2017) Robust visual tracking via binocular consistent sparse learning. Neural Process Lett 2:1–16

    Google Scholar 

  6. Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: 2009 IEEE conference on computer vision and pattern recognition (CVPR), pp 1794–1801

  7. Bo L, Ren X, Fox D (2013) Multipath sparse coding using hierarchical matching pursuit. In: 2013 IEEE conference on computer vision and pattern recognition (CVPR), pp 660–667

  8. Li Q, Zhang H, Guo J, Bhanu B, An L (2013) Reference-based scheme combined with K-SVD for scene image categorization. IEEE Signal Process Lett 20(1):67–70

    Article  Google Scholar 

  9. Liu Q, Liu C (2015) A novel locally linear KNN model for visual recognition. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 1329–1337

  10. Quan Y, Xu Y, Sun Y, Huang Y, Ji H (2016) Sparse coding for classification via discrimination ensemble. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 5839–5847

  11. Wright J, Yang AY, Sastry SS, Ma Y (2008) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227

    Article  Google Scholar 

  12. Chen B, Li J, Ma B, Wei G (2016) Convolutional sparse coding classification model for image classification. In: 2016 IEEE international conference on image processing (ICIP), pp 1918–1922

  13. Zhang Y, Zhao D, Sun J, Zou G, Li W (2016) Adaptive convolutional neural network and its application in face recognition. Neural Process Lett 43(2):389–399

    Article  Google Scholar 

  14. Liu C, Hou W, Liu D (2017) Foreign exchange rates forecasting with convolutional neural network. Neural Process Lett 2:1–25

    Google Scholar 

  15. Ding C, Hu Z, Karmoshi S, Zhu M (2017) A novel two-stage learning pipeline for deep neural networks. Neural Process Lett 46(1):159–169

    Article  Google Scholar 

  16. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: 2012 international conference on neural information processing systems (NIPS), pp 1097–1105

  17. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov, D (2014) Going deeper with convolutions. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR), pp 1–9

  18. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778

  19. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2):91–110

    Article  Google Scholar 

  20. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE conference on computer vision and pattern recognition (CVPR), pp 886–893

  21. Wang Z, Liu D, Chang S, Ling Q, Yang Y, Huang, TS (2016) D3: deep dual-domain based fast restoration of jpeg-compressed images. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2764–2772

  22. Shen F, Xu Y, Liu L, Yang Y, Huang Z, Shen H (2018) Unsupervised deep hashing with similarity-adaptive and discrete optimization. IEEE Trans Pattern Anal Mach Intell 40(12):3034–3044

    Article  Google Scholar 

  23. Papyan V, Romano Y, Elad M (2016) Convolutional neural networks analyzed via convolutional sparse coding. arXiv preprint arXiv:1607.08194

  24. Deng J, Dong W, Socher R, Li LJ, Li K, Li, FF (2009) ImageNet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition (CVPR), pp 248–255

  25. Cimpoi M, Maji S, Vedaldi A (2015) Deep filter banks for texture recognition and segmentation. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 3828–3836

  26. Daubechies I, Defrise M, De Mol C (2003) An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57(11):1413–1457

    Article  MathSciNet  Google Scholar 

  27. Gregor K, Lecun Y (2010) Learning fast approximations of sparse coding. In: 2010 international conference on machine learning (ICML), pp 399–406

  28. Zhang C, Liu J, Tian Q, Xu C, Lu H, Ma S (2014) Image classification by non-negative sparse coding, low-rank and sparse decomposition. Comput Vis Image Underst 123(7):14–22

    Article  Google Scholar 

  29. Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci 2(1):183–202

    Article  MathSciNet  Google Scholar 

  30. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  31. Nilsback ME, Zisserman A (2009) Automated flower classification over a large number of classes. In: 2009 Indian conference on computer vision, pp 722–729

  32. Feifei L, Fergus R, Perona P (2005) Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Comput Vis Image Underst 106(1):59–70

    Article  Google Scholar 

  33. Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset. California Institute of Technology, Pasadena

    Google Scholar 

  34. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 2015 international conference on learning representation (ICLR)

  35. Angelova A, Zhu S (2013) Efficient object detection and segmentation for fine-grained recognition. In: 2013 IEEE conference on computer vision and pattern recognition (CVPR), pp 811–818

  36. Murray N, Perronnin F (2014) Generalized max pooling. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR), pp 2473–2480

  37. Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) CNN features off-the-shelf: an astounding baseline for recognition. In: 2014 IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 512–519

  38. Cai S, Zhang L, Zuo W, Feng X (2016) A probabilistic collaborative representation based approach for pattern classification. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2950–2959

  39. Simon M, Rodner E (2015) Neural activation constellations: unsupervised part model discovery with convolutional networks. In: 2015 IEEE international conference on computer vision (ICCV), pp 1143–1151

  40. Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR), pp 3360–3367

  41. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: 2014 European conference on computer vision (ECCV), pp 818–833

    Chapter  Google Scholar 

  42. Xie GS, Zhang XY, Shu X, Yan S, Liu CL (2015) Task-Driven feature pooling for image classification. In: 2015 IEEE international conference on computer vision (ICCV), pp 1179–1187

  43. Gao BB, Wei XS, Wu J, Lin W (2015) Deep spatial pyramid: the devil is once again in the details. CoRR arXiv:1504.05277

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Nos. 61302055, 61401160, 61327005), the Science and Technology Planning Project of Guangdong Province (2017A020214011), the Funds for the Central Universities 2017MS039, the Guangdong Provincial Key Laboratory of Short-Range Wireless Detection and Communication (No. 2014B030301010, 2017B030314003), the Science and Technology Program of Guangzhou (No. 201804020079) and the Project sponsored by SRF for ROCS, SEM.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Boheng Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, B., Wang, Y., Wei, G. et al. End-to-End Trained Sparse Coding Network with Spatial Pyramid Pooling for Image Classification. Neural Process Lett 50, 2021–2036 (2019). https://doi.org/10.1007/s11063-018-9967-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-018-9967-5

Keywords

Navigation