On fusing the latent deep CNN feature for image classification

Liu, Xueliang; Zhang, Rongjie; Meng, Zhijun; Hong, Richang; Liu, Guangcan

doi:10.1007/s11280-018-0600-3

On fusing the latent deep CNN feature for image classification

Published: 15 June 2018

Volume 22, pages 423–436, (2019)
Cite this article

World Wide Web Aims and scope Submit manuscript

Xueliang Liu¹,
Rongjie Zhang¹,
Zhijun Meng ORCID: orcid.org/0000-0003-3163-5888²,
Richang Hong¹ &
…
Guangcan Liu³

1665 Accesses
29 Citations
Explore all metrics

Abstract

Image classification, which aims at assigning a semantic category to images, has been extensively studied during the past few years. More recently, convolution neural network arises and has achieved very promising achievement. Compared with traditional feature extraction techniques (e.g., SIFT, HOG, GIST), the convolutional neural network can extract features from image automatically and does not need hand designed features. However, how to further improve the classification algorithm is still challenging in academic research. The latest research on CNN shows that the features extracted from middle layers is representative, which shows a possible way to improve the classification accuracy. Based on the observation, in this paper, we propose a method to fuse the latent features extracted from the middle layers in a CNN to train a more robust classifier. First, we utilize the pretrained CNN models to extract visual features from middle layer. Then, we use supervised learning method to train classifiers for each feature respectively. Finally, we use the late fusion strategy to combine the prediction of these classifiers. We evaluate the proposal with different classification methods under some several images benchmarks, and the results demonstrate that the proposed method can improve the performance effectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Anderson, J.R., Matessa, M.: Explorations of an incremental, bayesian algorithm for categorization. Mach. Learn. 9(4), 275–308 (1992)
Google Scholar
Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V.: Neural codes for image retrieval. 8689:584–599 (2014)
Boyd, S., Vandenberghe, L.: Convex optimization. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Buf, J.M.H., Kardan, M., Spann, M.: Texture feature performance for image segmentation. Pattern Recogn. 23(3C4), 291–309 (1990)
Article Google Scholar
Chang, C.C., Lin, J.C.: LIBSVM: A library for support vector machines. ACM (2011)
Chen, W.S., Dai, X., Pan, B., Huang, T.: A novel discriminant criterion based on feature fusion strategy for face recognition. Neurocomputing 159(1), 67–77 (2015)
Google Scholar
Chowdhury, S., Verma, B., Stockwell, D.: A novel texture feature based multiple classifier technique for roadside vegetation classification. Expert Syst. Appl. 42(12), 5047–5055 (2015)
Article Google Scholar
Coates, A., Ng, A.Y., Lee, H.: An analysis of single-layer networks in unsupervised feature learning. J. Mach. Learn. Res. 15, 215–223 (2011)
Google Scholar
Le Cun, Y., Boser, B., Denker, J., Howard, R., Habbard, W, Jackel, L., Henderson, D.: Handwritten digit recognition with a back-propagation network. In: Advances in neural information processing systems, pp. 396–404 (1990)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern recognition CVPR 2005, pp. 886–893 (2005)
Delac, K., Grgic, M., Grgic, S.: Statistics in face recognition: analyzing probability distributions of pca, ica and lda performance results. In: International symposium on image and signal processing and analysis, pp. 289–294 (2005)
Deng, J., Dong, W., Socher, R., Li, J.L., Li, K., Li, F.F.: Imagenet: A large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp. 248–255 (2009)
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Decaf, T.D.: A deep convolutional activation feature for generic visual recognition. In: International conference on machine learning, pp. 647–655 (2014)
Gao, L., Guo, Z., Zhang, H., Xing, X.U., Shen, H.T.: Video captioning with attention-based lstm and semantic consistency. IEEE Trans. Multimed. 19(9), 2045–2055 (2017)
Article Google Scholar
Gevers, T.H., van de Weijer, J., Stokman, H.M.G.: Color feature detection: An overview. Color Image Process. Methods Appl. 2, II– 714–17 (2006)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation, pp. 580–587 (2013)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition, pp. 770–778 (2015)
Jaeger, M., Fawcett, T., Mishra, N.: Probabilistic classifiers and the concepts they recognize. In: 20th international conference on machine learning, pp. 266–273 (2003)
Jarrett, K., Kavukcuoglu, K., Marc’Aurelio, R., Lecun, Y.: What is the best multi-stage architecture for object recognition?. In: IEEE international conference on computer vision, pp. 2146–2153 (2010)
Jin, H., Liu, Q., Lu, H., Tong, X.: Face detection using improved lbp under bayesian framework. In: International conference on image and graphics, pp. 306–309 (2004)
Kataoka, H., Iwata, K., Satoh, Y.: Feature evaluation of deep convolutional neural networks for object recognition and detection. arXiv:1509.07627 (2015)
Kim, K.M., Park, J.J., Song, M.H., In, C.K., Suen, C.Y.: Binary decision tree using genetic algorithm for recognizing defect patterns of cold mill strip. Lect. Notes Comput. Sci 3029, 341–350 (2004)
Article Google Scholar
Kinnunen, T., Kamarainen, J.K., Lensu, L., Lankinen, J., Kalviainen, H.: Making visual object categorization more challenging: Randomized caltech-101 data set. In: International conference on pattern recognition, pp. 476–479 (2010)
Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report. University of Toronto, Toronto (2009)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: International conference on neural information processing systems, pp. 1097–1105 (2012)
Fukushima, K., Miyake, S., Ito, T.: Neocognitron: A neural network model for a mechanism of visual pattern recognition. IEEE Trans. Syst. Man Cybern. SMC-13 (5), 826–834 (1983)
Article Google Scholar
Lcun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Lee, S.J., Kim, H.J., Song, J.M.: Scalable encoding method of color histogram (2005)
Li, Z., Liu, J., Tang, J., Hanqing, L.U.: Robust structured subspace learning for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 37(10), 2085–2098 (2015)
Article Google Scholar
Li, Z., Liu, J., Yi, Y., Zhou, X., Hanqing, L.U.: Clustering-guided sparse structural learning for unsupervised feature selection. IEEE Trans. Knowl. Data Eng. 26(9), 2138–2150 (2014)
Article Google Scholar
Li, Z., Tang, J.: Weakly supervised deep metric learning for community-contributed image retrieval. IEEE Trans. Multimed. 17(11), 1989–1999 (2015)
Article Google Scholar
Li, Z., Tang, J.: Weakly supervised deep matrix factorization for social image understanding. IEEE Press (2017)
Lin, M., Chen, Q., Yan, S.: Network in network. arXiv:1312.4400 (2013)
Liu, C., Wechsler, H: A shape- and texture-based enhanced fisher classifier for face recognition. IEEE Trans. Image Process. Publ. IEEE Signal Process. Soc. 10(4), 598–608 (2001)
MATH Google Scholar
Di, L., Sun, D.M., Qiu, Z.D.: Wavelet decomposition 4-feature parallel fusion by quaternion euclidean product distance matching score for palmprint verification. In: International Conference on Signal Processing, pp. 2104–2107 (2008)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Kluwer Academic Publishers, Dordrecht (2004)
Book Google Scholar
Dengsheng, L.U., Weng, Q.: A survey of image classification methods and techniques for improving classification performance. Int. J. Remote Sens. 28(5), 823–870 (2007)
Article Google Scholar
Ng, Y.H., Yang, F., Davis, L.: Exploiting local features from deep networks for image retrieval. In: IEEE conference on computer vision and pattern recognition workshops, pp. 53–61 (2015)
Nie, L., Wang, M., Zha, Z.J., Chua, T.S.: Oracle in image search A content-based approach to performance prediction. ACM Trans. Inf. Syst. 30(2), 13 (2012)
Article Google Scholar
Oliva, A., Torralba, A.: Modeling the Shape of the Scene A Holistic Representation of the Spatial Envelope. Kluwer Academic Publishers, Dordrecht (2001)
Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Computer vision and pattern recognition, pp. 1717–1724 (2014)
Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: An astounding baseline for recognition. In: IEEE conference on computer vision and pattern recognition workshops, pp. 512–519 (2014)
Schmidhuber, J.: Deep learning in neural networks: An overview. Neural Netw. 61, 85–117 (2015)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Computer Science (2014)
Song, J., Gao, L., Nie, F., Shen, H., Yan, Y., Sebe, N: Optimized graph learning using partial tags and multiple features for image and video annotation. IEEE Trans. Image Process. Publ. IEEE Signal Process. Soc. 25(11), 4999–5011 (2016)
Article MathSciNet MATH Google Scholar
Song, J., Yi, Y., Zi, H., Shen, H.T., Luo, J.: Effective multiple feature hashing for large-scale near-duplicate video retrieval. IEEE Trans. Multimed. 15(8), 1997–2008 (2013)
Article Google Scholar
Sun, J., Cai, X., Sun, F., Zhang, J.: Scene image classification method based on alex-net model. In: International conference on informative and cybernetics for computational social systems (2016)
Sun, Q.-S., Zeng, S.-G., Heng, P.-A., Xia, D.-S.: The theory of canonical correlation analysis and its application to feature fusion. Chin. J. Comput. 36(9), 1524–1533 (2005)
MathSciNet Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Computer vision and pattern recognition, pp. 1–9 (2015)
Vega-Rodriguez, M.A.: Review: Feature extraction and image processing. Comput. J. 44(2), 595–599 (2004)
Google Scholar
Wang, S.: Application of tamura texture feature to classify underwater targets. Appl. Acoust. 31(2), 135–139 (2012)
Google Scholar
Wang, X., Gao, L., Wang, P., Sun, X., Liu, X.: Two-stream 3d convnet fusion for action recognition in videos with arbitrary size and length. IEEE Trans. Multimed. PP(99), 1–1 (2017)
Google Scholar
Wei, Y., Xia, W., Huang, J., Ni, B., Dong, J., Zhao, Y., Yan, S.: CNN: Single-label to multi-label. Computer Science (2014)
Xiong, H., Swamy, M.N.S., Ahmad, M.O.: Two-dimensional fld for face recognition. Pattern Recogn. 38(7), 1121–1124 (2005)
Article Google Scholar
Dan, X.U., Ricci, E., Yan, Y., Song, J., Sebe, N.: Learning deep representations of appearance and motion for anomalous event detection. arXiv:1510.01553 (2015)
Yang, J., Yang, J.Y., Zhang, D., Jian Feng, L.U.: Feature fusion: parallel strategy vs. serial strategy. Pattern Recogn 36(6), 1369–1381 (2003)
Article MATH Google Scholar
Yang, M., Kpalma, K., Ronsin, J.: A survey of shape feature extraction techniques. Pattern Recognition, pp. 43–90 (2008)
Zhao, J., Fan, Y., Fan, W.: Fusion of global and local features using kcca for automatic target recognition. In: 5th international conference on image and graphics, pp. 958–962 (2009)
Zhong, Y., Sullivan, J., Li, H.: Face attribute prediction with classification CNN. arXiv:1602.01827 (2016)
Zhong, Y., Sullivan, J., Li, H.: Leveraging mid-level deep representations for predicting face attributes in the wild. In: IEEE international conference on image processing (2016)
Zhou, X., Bhanu, B.: Feature fusion of side face and gait for video-based human identification. Pattern Recogn. 41(3), 778–795 (2008)
Article MATH Google Scholar

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (NSFC) under grants 61632007 and 61502139.

Author information

Authors and Affiliations

Hefei University of Technology, Hefei, 230009, China
Xueliang Liu, Rongjie Zhang & Richang Hong
Beihang University, Beijing, 100191, China
Zhijun Meng
Nanjing University of Information Science and Technology, Nanjing, 210044, China
Guangcan Liu

Authors

Xueliang Liu
View author publications
You can also search for this author inPubMed Google Scholar
Rongjie Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Zhijun Meng
View author publications
You can also search for this author inPubMed Google Scholar
Richang Hong
View author publications
You can also search for this author inPubMed Google Scholar
Guangcan Liu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Zhijun Meng.

Additional information

The original version of this article was revised: The original version of this article unfortunately contained a mistake. The spelling of Guangcan Liu’s name was incorrect.

This article belongs to the Topical Collection: Special Issue on Deep vs. Shallow: Learning for Emerging Web-scale Data Computing and Applications

Guest Editors: Jingkuan Song, Shuqiang Jiang, Elisa Ricci, and Zi Huang

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, X., Zhang, R., Meng, Z. et al. On fusing the latent deep CNN feature for image classification. World Wide Web 22, 423–436 (2019). https://doi.org/10.1007/s11280-018-0600-3

Download citation

Received: 08 September 2017
Revised: 25 January 2018
Accepted: 24 May 2018
Published: 15 June 2018
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s11280-018-0600-3

Keywords

Part of a collection:

Special Issue on Deep vs. Shallow: Learning for Emerging Web-scale Data Computing and Applications

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On fusing the latent deep CNN feature for image classification

Abstract

Access this article

Subscribe and save

Buy Now

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now