Skip to main content
Log in

On fusing the latent deep CNN feature for image classification

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Image classification, which aims at assigning a semantic category to images, has been extensively studied during the past few years. More recently, convolution neural network arises and has achieved very promising achievement. Compared with traditional feature extraction techniques (e.g., SIFT, HOG, GIST), the convolutional neural network can extract features from image automatically and does not need hand designed features. However, how to further improve the classification algorithm is still challenging in academic research. The latest research on CNN shows that the features extracted from middle layers is representative, which shows a possible way to improve the classification accuracy. Based on the observation, in this paper, we propose a method to fuse the latent features extracted from the middle layers in a CNN to train a more robust classifier. First, we utilize the pretrained CNN models to extract visual features from middle layer. Then, we use supervised learning method to train classifiers for each feature respectively. Finally, we use the late fusion strategy to combine the prediction of these classifiers. We evaluate the proposal with different classification methods under some several images benchmarks, and the results demonstrate that the proposed method can improve the performance effectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3

Similar content being viewed by others

References

  1. Anderson, J.R., Matessa, M.: Explorations of an incremental, bayesian algorithm for categorization. Mach. Learn. 9(4), 275–308 (1992)

    Google Scholar 

  2. Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V.: Neural codes for image retrieval. 8689:584–599 (2014)

  3. Boyd, S., Vandenberghe, L.: Convex optimization. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  4. Buf, J.M.H., Kardan, M., Spann, M.: Texture feature performance for image segmentation. Pattern Recogn. 23(3C4), 291–309 (1990)

    Article  Google Scholar 

  5. Chang, C.C., Lin, J.C.: LIBSVM: A library for support vector machines. ACM (2011)

  6. Chen, W.S., Dai, X., Pan, B., Huang, T.: A novel discriminant criterion based on feature fusion strategy for face recognition. Neurocomputing 159(1), 67–77 (2015)

    Google Scholar 

  7. Chowdhury, S., Verma, B., Stockwell, D.: A novel texture feature based multiple classifier technique for roadside vegetation classification. Expert Syst. Appl. 42(12), 5047–5055 (2015)

    Article  Google Scholar 

  8. Coates, A., Ng, A.Y., Lee, H.: An analysis of single-layer networks in unsupervised feature learning. J. Mach. Learn. Res. 15, 215–223 (2011)

    Google Scholar 

  9. Le Cun, Y., Boser, B., Denker, J., Howard, R., Habbard, W, Jackel, L., Henderson, D.: Handwritten digit recognition with a back-propagation network. In: Advances in neural information processing systems, pp. 396–404 (1990)

  10. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern recognition CVPR 2005, pp. 886–893 (2005)

  11. Delac, K., Grgic, M., Grgic, S.: Statistics in face recognition: analyzing probability distributions of pca, ica and lda performance results. In: International symposium on image and signal processing and analysis, pp. 289–294 (2005)

  12. Deng, J., Dong, W., Socher, R., Li, J.L., Li, K., Li, F.F.: Imagenet: A large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp. 248–255 (2009)

  13. Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Decaf, T.D.: A deep convolutional activation feature for generic visual recognition. In: International conference on machine learning, pp. 647–655 (2014)

  14. Gao, L., Guo, Z., Zhang, H., Xing, X.U., Shen, H.T.: Video captioning with attention-based lstm and semantic consistency. IEEE Trans. Multimed. 19(9), 2045–2055 (2017)

    Article  Google Scholar 

  15. Gevers, T.H., van de Weijer, J., Stokman, H.M.G.: Color feature detection: An overview. Color Image Process. Methods Appl. 2, II– 714–17 (2006)

    Google Scholar 

  16. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation, pp. 580–587 (2013)

  17. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition, pp. 770–778 (2015)

  18. Jaeger, M., Fawcett, T., Mishra, N.: Probabilistic classifiers and the concepts they recognize. In: 20th international conference on machine learning, pp. 266–273 (2003)

  19. Jarrett, K., Kavukcuoglu, K., Marc’Aurelio, R., Lecun, Y.: What is the best multi-stage architecture for object recognition?. In: IEEE international conference on computer vision, pp. 2146–2153 (2010)

  20. Jin, H., Liu, Q., Lu, H., Tong, X.: Face detection using improved lbp under bayesian framework. In: International conference on image and graphics, pp. 306–309 (2004)

  21. Kataoka, H., Iwata, K., Satoh, Y.: Feature evaluation of deep convolutional neural networks for object recognition and detection. arXiv:1509.07627 (2015)

  22. Kim, K.M., Park, J.J., Song, M.H., In, C.K., Suen, C.Y.: Binary decision tree using genetic algorithm for recognizing defect patterns of cold mill strip. Lect. Notes Comput. Sci 3029, 341–350 (2004)

    Article  Google Scholar 

  23. Kinnunen, T., Kamarainen, J.K., Lensu, L., Lankinen, J., Kalviainen, H.: Making visual object categorization more challenging: Randomized caltech-101 data set. In: International conference on pattern recognition, pp. 476–479 (2010)

  24. Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report. University of Toronto, Toronto (2009)

    Google Scholar 

  25. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: International conference on neural information processing systems, pp. 1097–1105 (2012)

  26. Fukushima, K., Miyake, S., Ito, T.: Neocognitron: A neural network model for a mechanism of visual pattern recognition. IEEE Trans. Syst. Man Cybern. SMC-13 (5), 826–834 (1983)

    Article  Google Scholar 

  27. Lcun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  28. Lee, S.J., Kim, H.J., Song, J.M.: Scalable encoding method of color histogram (2005)

  29. Li, Z., Liu, J., Tang, J., Hanqing, L.U.: Robust structured subspace learning for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 37(10), 2085–2098 (2015)

    Article  Google Scholar 

  30. Li, Z., Liu, J., Yi, Y., Zhou, X., Hanqing, L.U.: Clustering-guided sparse structural learning for unsupervised feature selection. IEEE Trans. Knowl. Data Eng. 26(9), 2138–2150 (2014)

    Article  Google Scholar 

  31. Li, Z., Tang, J.: Weakly supervised deep metric learning for community-contributed image retrieval. IEEE Trans. Multimed. 17(11), 1989–1999 (2015)

    Article  Google Scholar 

  32. Li, Z., Tang, J.: Weakly supervised deep matrix factorization for social image understanding. IEEE Press (2017)

  33. Lin, M., Chen, Q., Yan, S.: Network in network. arXiv:1312.4400 (2013)

  34. Liu, C., Wechsler, H: A shape- and texture-based enhanced fisher classifier for face recognition. IEEE Trans. Image Process. Publ. IEEE Signal Process. Soc. 10(4), 598–608 (2001)

    MATH  Google Scholar 

  35. Di, L., Sun, D.M., Qiu, Z.D.: Wavelet decomposition 4-feature parallel fusion by quaternion euclidean product distance matching score for palmprint verification. In: International Conference on Signal Processing, pp. 2104–2107 (2008)

  36. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Kluwer Academic Publishers, Dordrecht (2004)

    Book  Google Scholar 

  37. Dengsheng, L.U., Weng, Q.: A survey of image classification methods and techniques for improving classification performance. Int. J. Remote Sens. 28(5), 823–870 (2007)

    Article  Google Scholar 

  38. Ng, Y.H., Yang, F., Davis, L.: Exploiting local features from deep networks for image retrieval. In: IEEE conference on computer vision and pattern recognition workshops, pp. 53–61 (2015)

  39. Nie, L., Wang, M., Zha, Z.J., Chua, T.S.: Oracle in image search A content-based approach to performance prediction. ACM Trans. Inf. Syst. 30(2), 13 (2012)

    Article  Google Scholar 

  40. Oliva, A., Torralba, A.: Modeling the Shape of the Scene A Holistic Representation of the Spatial Envelope. Kluwer Academic Publishers, Dordrecht (2001)

  41. Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Computer vision and pattern recognition, pp. 1717–1724 (2014)

  42. Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: An astounding baseline for recognition. In: IEEE conference on computer vision and pattern recognition workshops, pp. 512–519 (2014)

  43. Schmidhuber, J.: Deep learning in neural networks: An overview. Neural Netw. 61, 85–117 (2015)

    Article  Google Scholar 

  44. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Computer Science (2014)

  45. Song, J., Gao, L., Nie, F., Shen, H., Yan, Y., Sebe, N: Optimized graph learning using partial tags and multiple features for image and video annotation. IEEE Trans. Image Process. Publ. IEEE Signal Process. Soc. 25(11), 4999–5011 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  46. Song, J., Yi, Y., Zi, H., Shen, H.T., Luo, J.: Effective multiple feature hashing for large-scale near-duplicate video retrieval. IEEE Trans. Multimed. 15(8), 1997–2008 (2013)

    Article  Google Scholar 

  47. Sun, J., Cai, X., Sun, F., Zhang, J.: Scene image classification method based on alex-net model. In: International conference on informative and cybernetics for computational social systems (2016)

  48. Sun, Q.-S., Zeng, S.-G., Heng, P.-A., Xia, D.-S.: The theory of canonical correlation analysis and its application to feature fusion. Chin. J. Comput. 36(9), 1524–1533 (2005)

    MathSciNet  Google Scholar 

  49. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Computer vision and pattern recognition, pp. 1–9 (2015)

  50. Vega-Rodriguez, M.A.: Review: Feature extraction and image processing. Comput. J. 44(2), 595–599 (2004)

    Google Scholar 

  51. Wang, S.: Application of tamura texture feature to classify underwater targets. Appl. Acoust. 31(2), 135–139 (2012)

    Google Scholar 

  52. Wang, X., Gao, L., Wang, P., Sun, X., Liu, X.: Two-stream 3d convnet fusion for action recognition in videos with arbitrary size and length. IEEE Trans. Multimed. PP(99), 1–1 (2017)

    Google Scholar 

  53. Wei, Y., Xia, W., Huang, J., Ni, B., Dong, J., Zhao, Y., Yan, S.: CNN: Single-label to multi-label. Computer Science (2014)

  54. Xiong, H., Swamy, M.N.S., Ahmad, M.O.: Two-dimensional fld for face recognition. Pattern Recogn. 38(7), 1121–1124 (2005)

    Article  Google Scholar 

  55. Dan, X.U., Ricci, E., Yan, Y., Song, J., Sebe, N.: Learning deep representations of appearance and motion for anomalous event detection. arXiv:1510.01553 (2015)

  56. Yang, J., Yang, J.Y., Zhang, D., Jian Feng, L.U.: Feature fusion: parallel strategy vs. serial strategy. Pattern Recogn 36(6), 1369–1381 (2003)

    Article  MATH  Google Scholar 

  57. Yang, M., Kpalma, K., Ronsin, J.: A survey of shape feature extraction techniques. Pattern Recognition, pp. 43–90 (2008)

  58. Zhao, J., Fan, Y., Fan, W.: Fusion of global and local features using kcca for automatic target recognition. In: 5th international conference on image and graphics, pp. 958–962 (2009)

  59. Zhong, Y., Sullivan, J., Li, H.: Face attribute prediction with classification CNN. arXiv:1602.01827 (2016)

  60. Zhong, Y., Sullivan, J., Li, H.: Leveraging mid-level deep representations for predicting face attributes in the wild. In: IEEE international conference on image processing (2016)

  61. Zhou, X., Bhanu, B.: Feature fusion of side face and gait for video-based human identification. Pattern Recogn. 41(3), 778–795 (2008)

    Article  MATH  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (NSFC) under grants 61632007 and 61502139.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhijun Meng.

Additional information

The original version of this article was revised: The original version of this article unfortunately contained a mistake. The spelling of Guangcan Liu’s name was incorrect.

This article belongs to the Topical Collection: Special Issue on Deep vs. Shallow: Learning for Emerging Web-scale Data Computing and Applications

Guest Editors: Jingkuan Song, Shuqiang Jiang, Elisa Ricci, and Zi Huang

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, X., Zhang, R., Meng, Z. et al. On fusing the latent deep CNN feature for image classification. World Wide Web 22, 423–436 (2019). https://doi.org/10.1007/s11280-018-0600-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-018-0600-3

Keywords

Navigation