Abstract
Image classification, which aims at assigning a semantic category to images, has been extensively studied during the past few years. More recently, convolution neural network arises and has achieved very promising achievement. Compared with traditional feature extraction techniques (e.g., SIFT, HOG, GIST), the convolutional neural network can extract features from image automatically and does not need hand designed features. However, how to further improve the classification algorithm is still challenging in academic research. The latest research on CNN shows that the features extracted from middle layers is representative, which shows a possible way to improve the classification accuracy. Based on the observation, in this paper, we propose a method to fuse the latent features extracted from the middle layers in a CNN to train a more robust classifier. First, we utilize the pretrained CNN models to extract visual features from middle layer. Then, we use supervised learning method to train classifiers for each feature respectively. Finally, we use the late fusion strategy to combine the prediction of these classifiers. We evaluate the proposal with different classification methods under some several images benchmarks, and the results demonstrate that the proposed method can improve the performance effectively.



References
Anderson, J.R., Matessa, M.: Explorations of an incremental, bayesian algorithm for categorization. Mach. Learn. 9(4), 275–308 (1992)
Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V.: Neural codes for image retrieval. 8689:584–599 (2014)
Boyd, S., Vandenberghe, L.: Convex optimization. Cambridge University Press, Cambridge (2004)
Buf, J.M.H., Kardan, M., Spann, M.: Texture feature performance for image segmentation. Pattern Recogn. 23(3C4), 291–309 (1990)
Chang, C.C., Lin, J.C.: LIBSVM: A library for support vector machines. ACM (2011)
Chen, W.S., Dai, X., Pan, B., Huang, T.: A novel discriminant criterion based on feature fusion strategy for face recognition. Neurocomputing 159(1), 67–77 (2015)
Chowdhury, S., Verma, B., Stockwell, D.: A novel texture feature based multiple classifier technique for roadside vegetation classification. Expert Syst. Appl. 42(12), 5047–5055 (2015)
Coates, A., Ng, A.Y., Lee, H.: An analysis of single-layer networks in unsupervised feature learning. J. Mach. Learn. Res. 15, 215–223 (2011)
Le Cun, Y., Boser, B., Denker, J., Howard, R., Habbard, W, Jackel, L., Henderson, D.: Handwritten digit recognition with a back-propagation network. In: Advances in neural information processing systems, pp. 396–404 (1990)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern recognition CVPR 2005, pp. 886–893 (2005)
Delac, K., Grgic, M., Grgic, S.: Statistics in face recognition: analyzing probability distributions of pca, ica and lda performance results. In: International symposium on image and signal processing and analysis, pp. 289–294 (2005)
Deng, J., Dong, W., Socher, R., Li, J.L., Li, K., Li, F.F.: Imagenet: A large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp. 248–255 (2009)
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Decaf, T.D.: A deep convolutional activation feature for generic visual recognition. In: International conference on machine learning, pp. 647–655 (2014)
Gao, L., Guo, Z., Zhang, H., Xing, X.U., Shen, H.T.: Video captioning with attention-based lstm and semantic consistency. IEEE Trans. Multimed. 19(9), 2045–2055 (2017)
Gevers, T.H., van de Weijer, J., Stokman, H.M.G.: Color feature detection: An overview. Color Image Process. Methods Appl. 2, II– 714–17 (2006)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation, pp. 580–587 (2013)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition, pp. 770–778 (2015)
Jaeger, M., Fawcett, T., Mishra, N.: Probabilistic classifiers and the concepts they recognize. In: 20th international conference on machine learning, pp. 266–273 (2003)
Jarrett, K., Kavukcuoglu, K., Marc’Aurelio, R., Lecun, Y.: What is the best multi-stage architecture for object recognition?. In: IEEE international conference on computer vision, pp. 2146–2153 (2010)
Jin, H., Liu, Q., Lu, H., Tong, X.: Face detection using improved lbp under bayesian framework. In: International conference on image and graphics, pp. 306–309 (2004)
Kataoka, H., Iwata, K., Satoh, Y.: Feature evaluation of deep convolutional neural networks for object recognition and detection. arXiv:1509.07627 (2015)
Kim, K.M., Park, J.J., Song, M.H., In, C.K., Suen, C.Y.: Binary decision tree using genetic algorithm for recognizing defect patterns of cold mill strip. Lect. Notes Comput. Sci 3029, 341–350 (2004)
Kinnunen, T., Kamarainen, J.K., Lensu, L., Lankinen, J., Kalviainen, H.: Making visual object categorization more challenging: Randomized caltech-101 data set. In: International conference on pattern recognition, pp. 476–479 (2010)
Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report. University of Toronto, Toronto (2009)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: International conference on neural information processing systems, pp. 1097–1105 (2012)
Fukushima, K., Miyake, S., Ito, T.: Neocognitron: A neural network model for a mechanism of visual pattern recognition. IEEE Trans. Syst. Man Cybern. SMC-13 (5), 826–834 (1983)
Lcun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Lee, S.J., Kim, H.J., Song, J.M.: Scalable encoding method of color histogram (2005)
Li, Z., Liu, J., Tang, J., Hanqing, L.U.: Robust structured subspace learning for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 37(10), 2085–2098 (2015)
Li, Z., Liu, J., Yi, Y., Zhou, X., Hanqing, L.U.: Clustering-guided sparse structural learning for unsupervised feature selection. IEEE Trans. Knowl. Data Eng. 26(9), 2138–2150 (2014)
Li, Z., Tang, J.: Weakly supervised deep metric learning for community-contributed image retrieval. IEEE Trans. Multimed. 17(11), 1989–1999 (2015)
Li, Z., Tang, J.: Weakly supervised deep matrix factorization for social image understanding. IEEE Press (2017)
Lin, M., Chen, Q., Yan, S.: Network in network. arXiv:1312.4400 (2013)
Liu, C., Wechsler, H: A shape- and texture-based enhanced fisher classifier for face recognition. IEEE Trans. Image Process. Publ. IEEE Signal Process. Soc. 10(4), 598–608 (2001)
Di, L., Sun, D.M., Qiu, Z.D.: Wavelet decomposition 4-feature parallel fusion by quaternion euclidean product distance matching score for palmprint verification. In: International Conference on Signal Processing, pp. 2104–2107 (2008)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Kluwer Academic Publishers, Dordrecht (2004)
Dengsheng, L.U., Weng, Q.: A survey of image classification methods and techniques for improving classification performance. Int. J. Remote Sens. 28(5), 823–870 (2007)
Ng, Y.H., Yang, F., Davis, L.: Exploiting local features from deep networks for image retrieval. In: IEEE conference on computer vision and pattern recognition workshops, pp. 53–61 (2015)
Nie, L., Wang, M., Zha, Z.J., Chua, T.S.: Oracle in image search A content-based approach to performance prediction. ACM Trans. Inf. Syst. 30(2), 13 (2012)
Oliva, A., Torralba, A.: Modeling the Shape of the Scene A Holistic Representation of the Spatial Envelope. Kluwer Academic Publishers, Dordrecht (2001)
Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Computer vision and pattern recognition, pp. 1717–1724 (2014)
Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: An astounding baseline for recognition. In: IEEE conference on computer vision and pattern recognition workshops, pp. 512–519 (2014)
Schmidhuber, J.: Deep learning in neural networks: An overview. Neural Netw. 61, 85–117 (2015)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Computer Science (2014)
Song, J., Gao, L., Nie, F., Shen, H., Yan, Y., Sebe, N: Optimized graph learning using partial tags and multiple features for image and video annotation. IEEE Trans. Image Process. Publ. IEEE Signal Process. Soc. 25(11), 4999–5011 (2016)
Song, J., Yi, Y., Zi, H., Shen, H.T., Luo, J.: Effective multiple feature hashing for large-scale near-duplicate video retrieval. IEEE Trans. Multimed. 15(8), 1997–2008 (2013)
Sun, J., Cai, X., Sun, F., Zhang, J.: Scene image classification method based on alex-net model. In: International conference on informative and cybernetics for computational social systems (2016)
Sun, Q.-S., Zeng, S.-G., Heng, P.-A., Xia, D.-S.: The theory of canonical correlation analysis and its application to feature fusion. Chin. J. Comput. 36(9), 1524–1533 (2005)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Computer vision and pattern recognition, pp. 1–9 (2015)
Vega-Rodriguez, M.A.: Review: Feature extraction and image processing. Comput. J. 44(2), 595–599 (2004)
Wang, S.: Application of tamura texture feature to classify underwater targets. Appl. Acoust. 31(2), 135–139 (2012)
Wang, X., Gao, L., Wang, P., Sun, X., Liu, X.: Two-stream 3d convnet fusion for action recognition in videos with arbitrary size and length. IEEE Trans. Multimed. PP(99), 1–1 (2017)
Wei, Y., Xia, W., Huang, J., Ni, B., Dong, J., Zhao, Y., Yan, S.: CNN: Single-label to multi-label. Computer Science (2014)
Xiong, H., Swamy, M.N.S., Ahmad, M.O.: Two-dimensional fld for face recognition. Pattern Recogn. 38(7), 1121–1124 (2005)
Dan, X.U., Ricci, E., Yan, Y., Song, J., Sebe, N.: Learning deep representations of appearance and motion for anomalous event detection. arXiv:1510.01553 (2015)
Yang, J., Yang, J.Y., Zhang, D., Jian Feng, L.U.: Feature fusion: parallel strategy vs. serial strategy. Pattern Recogn 36(6), 1369–1381 (2003)
Yang, M., Kpalma, K., Ronsin, J.: A survey of shape feature extraction techniques. Pattern Recognition, pp. 43–90 (2008)
Zhao, J., Fan, Y., Fan, W.: Fusion of global and local features using kcca for automatic target recognition. In: 5th international conference on image and graphics, pp. 958–962 (2009)
Zhong, Y., Sullivan, J., Li, H.: Face attribute prediction with classification CNN. arXiv:1602.01827 (2016)
Zhong, Y., Sullivan, J., Li, H.: Leveraging mid-level deep representations for predicting face attributes in the wild. In: IEEE international conference on image processing (2016)
Zhou, X., Bhanu, B.: Feature fusion of side face and gait for video-based human identification. Pattern Recogn. 41(3), 778–795 (2008)
Acknowledgments
This work was supported by the National Natural Science Foundation of China (NSFC) under grants 61632007 and 61502139.
Author information
Authors and Affiliations
Corresponding author
Additional information
The original version of this article was revised: The original version of this article unfortunately contained a mistake. The spelling of Guangcan Liu’s name was incorrect.
This article belongs to the Topical Collection: Special Issue on Deep vs. Shallow: Learning for Emerging Web-scale Data Computing and Applications
Guest Editors: Jingkuan Song, Shuqiang Jiang, Elisa Ricci, and Zi Huang
Rights and permissions
About this article
Cite this article
Liu, X., Zhang, R., Meng, Z. et al. On fusing the latent deep CNN feature for image classification. World Wide Web 22, 423–436 (2019). https://doi.org/10.1007/s11280-018-0600-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-018-0600-3