Abstract
Spatial pyramid matching using sparse coding (ScSPM) has become an efficient method and a benchmark in image classification. However, since it is unsupervised, the trained dictionary may be suboptimal. To further improve classification accuracy, in this paper we propose a sparse coding network with spatial pyramid pooling based on the end-to-end deep learning approach. In our new system, the minimization problem in sparse coding can be modeled as a feed-forward neural network and image features can be extracted by the deep convolutional network. By minimizing the final classifier loss using the end-to-end deep learning method, the sparse coding network can be trained in a supervised way. Our proposed model is tested on three image databases and in terms of classification accuracy, it significantly outperforms ScSPM. Compared with other image classification approaches based on deep learning, it can also achieve a noticeable improvement.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Zhao H, Luo J, Huang Z, Nagumo T, Murayama J, Zhang L (2015) Statistically adaptive image denoising based on overcomplete topographic sparse coding. Neural Process Lett 41(3):1–13
Akhtar N, Shafait F, Mian A (2015) Bayesian sparse representation for hyperspectral image super resolution. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 3631–3640
Yang J, Wright J, Huang TS, Ma Y (2010) Image super-resolution via sparse representation. IEEE Trans Image Process 19(11):2861–2873
Wang Z, Liu D, Yang J, Han W, Huang T (2015) Deep networks for image super-resolution with sparse prior. In: 2015 IEEE international conference on computer vision (ICCV), pp 370–378
Ma Z, Xiang Z (2017) Robust visual tracking via binocular consistent sparse learning. Neural Process Lett 2:1–16
Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: 2009 IEEE conference on computer vision and pattern recognition (CVPR), pp 1794–1801
Bo L, Ren X, Fox D (2013) Multipath sparse coding using hierarchical matching pursuit. In: 2013 IEEE conference on computer vision and pattern recognition (CVPR), pp 660–667
Li Q, Zhang H, Guo J, Bhanu B, An L (2013) Reference-based scheme combined with K-SVD for scene image categorization. IEEE Signal Process Lett 20(1):67–70
Liu Q, Liu C (2015) A novel locally linear KNN model for visual recognition. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 1329–1337
Quan Y, Xu Y, Sun Y, Huang Y, Ji H (2016) Sparse coding for classification via discrimination ensemble. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 5839–5847
Wright J, Yang AY, Sastry SS, Ma Y (2008) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227
Chen B, Li J, Ma B, Wei G (2016) Convolutional sparse coding classification model for image classification. In: 2016 IEEE international conference on image processing (ICIP), pp 1918–1922
Zhang Y, Zhao D, Sun J, Zou G, Li W (2016) Adaptive convolutional neural network and its application in face recognition. Neural Process Lett 43(2):389–399
Liu C, Hou W, Liu D (2017) Foreign exchange rates forecasting with convolutional neural network. Neural Process Lett 2:1–25
Ding C, Hu Z, Karmoshi S, Zhu M (2017) A novel two-stage learning pipeline for deep neural networks. Neural Process Lett 46(1):159–169
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: 2012 international conference on neural information processing systems (NIPS), pp 1097–1105
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov, D (2014) Going deeper with convolutions. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR), pp 1–9
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2):91–110
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE conference on computer vision and pattern recognition (CVPR), pp 886–893
Wang Z, Liu D, Chang S, Ling Q, Yang Y, Huang, TS (2016) D3: deep dual-domain based fast restoration of jpeg-compressed images. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2764–2772
Shen F, Xu Y, Liu L, Yang Y, Huang Z, Shen H (2018) Unsupervised deep hashing with similarity-adaptive and discrete optimization. IEEE Trans Pattern Anal Mach Intell 40(12):3034–3044
Papyan V, Romano Y, Elad M (2016) Convolutional neural networks analyzed via convolutional sparse coding. arXiv preprint arXiv:1607.08194
Deng J, Dong W, Socher R, Li LJ, Li K, Li, FF (2009) ImageNet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition (CVPR), pp 248–255
Cimpoi M, Maji S, Vedaldi A (2015) Deep filter banks for texture recognition and segmentation. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 3828–3836
Daubechies I, Defrise M, De Mol C (2003) An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57(11):1413–1457
Gregor K, Lecun Y (2010) Learning fast approximations of sparse coding. In: 2010 international conference on machine learning (ICML), pp 399–406
Zhang C, Liu J, Tian Q, Xu C, Lu H, Ma S (2014) Image classification by non-negative sparse coding, low-rank and sparse decomposition. Comput Vis Image Underst 123(7):14–22
Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci 2(1):183–202
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Nilsback ME, Zisserman A (2009) Automated flower classification over a large number of classes. In: 2009 Indian conference on computer vision, pp 722–729
Feifei L, Fergus R, Perona P (2005) Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Comput Vis Image Underst 106(1):59–70
Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset. California Institute of Technology, Pasadena
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 2015 international conference on learning representation (ICLR)
Angelova A, Zhu S (2013) Efficient object detection and segmentation for fine-grained recognition. In: 2013 IEEE conference on computer vision and pattern recognition (CVPR), pp 811–818
Murray N, Perronnin F (2014) Generalized max pooling. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR), pp 2473–2480
Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) CNN features off-the-shelf: an astounding baseline for recognition. In: 2014 IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 512–519
Cai S, Zhang L, Zuo W, Feng X (2016) A probabilistic collaborative representation based approach for pattern classification. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2950–2959
Simon M, Rodner E (2015) Neural activation constellations: unsupervised part model discovery with convolutional networks. In: 2015 IEEE international conference on computer vision (ICCV), pp 1143–1151
Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR), pp 3360–3367
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: 2014 European conference on computer vision (ECCV), pp 818–833
Xie GS, Zhang XY, Shu X, Yan S, Liu CL (2015) Task-Driven feature pooling for image classification. In: 2015 IEEE international conference on computer vision (ICCV), pp 1179–1187
Gao BB, Wei XS, Wu J, Lin W (2015) Deep spatial pyramid: the devil is once again in the details. CoRR arXiv:1504.05277
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Nos. 61302055, 61401160, 61327005), the Science and Technology Planning Project of Guangdong Province (2017A020214011), the Funds for the Central Universities 2017MS039, the Guangdong Provincial Key Laboratory of Short-Range Wireless Detection and Communication (No. 2014B030301010, 2017B030314003), the Science and Technology Program of Guangzhou (No. 201804020079) and the Project sponsored by SRF for ROCS, SEM.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, B., Wang, Y., Wei, G. et al. End-to-End Trained Sparse Coding Network with Spatial Pyramid Pooling for Image Classification. Neural Process Lett 50, 2021–2036 (2019). https://doi.org/10.1007/s11063-018-9967-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-018-9967-5