Deep sparse representation-based mid-level visual elements discovery in fine-grained classification

Lv, Le; Zhao, Dongbin; Shao, Kun

doi:10.1007/s00500-018-3468-3

Deep sparse representation-based mid-level visual elements discovery in fine-grained classification

Methodologies and Application
Published: 22 August 2018

Volume 23, pages 8711–8722, (2019)
Cite this article

Soft Computing Aims and scope Submit manuscript

286 Accesses
4 Citations
Explore all metrics

Abstract

In this paper, we propose a new mid-level visual elements discovery method and apply it to the fine-grained classification. We present the duality between image patches and features extracted by the convolutional winner-take-all autoencoder (CONV-WTA-AE). The sparsity constraints used by CONV-WTA-AE make a group of objects sharing the same feature components. Hence, the image patches could be clustered by their sharing feature components and the feature components can be clustered by their co-occurrence in the image patches. We propose formulating the mid-level visual elements mining as a bipartite graph partitioning problem. The spectral partitioning algorithm is employed to co-cluster image patches and feature components. The CONV-WTA-AE is an unsupervised feature learning method. Hence, it avoids using expensive annotations. Our experiments demonstrate that the spectral partitioning method is very efficient but only the confident instances in a cluster are well discriminated. The similarity metric used by this algorithm is not accurate enough. Hence, we propose training a group of linear support vector machine (SVM) to refine the clustering results. These SVMs will be trained on the initial confident instances and provide a better discriminative similarity. Then we can re-assign instances to each clusters. To avoid overfitting, this process is iterated on many data subsets. We conduct a series of experiments on the MNIST dataset to verify our algorithm. The experimental results show that our method can discover meaningful image patch clusters. In the fine-grained classification task, visual elements are input into an ensemble of convolutional neural networks. The experiments on the CompCars dataset illustrate that our method can achieve the state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining Mid-level Visual Patterns with Deep CNN Activations

Article 29 August 2016

Multi-scale Discriminative Patches for Fined-Grained Visual Categorization

Unsupervised Visual Representation Learning by Graph-Based Consistent Constraints

References

Alpert CJ, Yao S-Z (1995) Spectral partitioning: The more eigenvectors, the better. In: Proceedings of the 32st Conference on Design Automation, San Francisco, California, USA, Moscone Center, June 12–16, 1995., pp 195–200
Bengio Y, Courville AC, Vincent P (2012) Unsupervised feature learning and deep learning:a review and new perspectives. CoRR, abs/1206.5538
Chen Y, Zhao D, Lv L, Zhang Q (2018) Multi-task learning for dangerous object detection in autonomous driving. Inf Sci 432:559–571
Article Google Scholar
Chen Y, Zhao D, Li H, Guo P (2018) A temporal-based deep learning method for multiple objects detection in autonomous driving. In: 2018 international joint conference on neural networks (IJCNN)
Coates A, Ng AY, Lee H (2011) An analysis of single-layer networks in unsupervised feature learning. In: proceedings of the fourteenth international conference on artificial intelligence and statistics, aistats 2011, Fort Lauderdale, USA, April 11–13, 2011, pp 215–223
Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 269–274
Doersch C, Gupta A, Efros AA (2013) Mid-level visual element discovery as discriminative mode seeking. In: Advances in neural information processing systems 26: 27th annual conference on neural information processing systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States., pp 494–502
Erhan D, Bengio Y, Courville AC, Manzagol P-A, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11:625–660
MathSciNet MATH Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016, pp 770–778
Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Article MathSciNet MATH Google Scholar
Jianbo S, Jitendra M (1997) Normalized cuts and image segmentation. In: 1997 conference on computer vision and pattern recognition (CVPR ’97), June 17–19, 1997. San Juan, Puerto Rico, pp 731–737
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093
Kavukcuoglu K, Ranzato MA, LeCun Y (2010) Fast inference in sparse coding algorithms with applications to object recognition. CoRR, abs/1010.3467
Kingma DP, Welling M (2013) Auto-encoding variational bayes. CoRR, abs/1312.6114
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Li Y, Liu L, Shen C, van den Hengel A (2015) Mid-level deep pattern mining. In: IEEE conference on computer vision and pattern recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, pp 971–980
Li J, Liu G, Wong L (2007) Mining statistically important equivalence classes and delta-discriminative emerging patterns. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, San Jose, California, USA, August 12–15, 2007, pp 430–439
Li D, Zhao D, Chen Y, Zhang Q (2018) Deepsign: Deep learning based traffic sign recognition. In: 2018 international joint conference on neural networks (IJCNN), July 2018
Lv L, Zhao D, Deng Q (2016) A semi-supervised predictive sparse decomposition based on task-driven dictionary learning. Cognit Comput, pp 1–10
Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: ICML workshop on deep learning for audio, speech and language processing
Makhzani A, Frey BJ (2015) Winner-take-all autoencoders. In: Advances in neural information processing systems 28: annual conference on neural information processing systems 2015, December 7–12, 2015, Montreal, Quebec, Canada, pp 2791–2799
Malisiewicz T, Gupta A, Efros AA (2011) Ensemble of exemplar-svms for object detection and beyond. In: IEEE international conference on computer vision, ICCV 2011, Barcelona, Spain, November 6–13, 2011, pp 89–96
Moon H-M, Seo C-H, Pan SB (2017) A face recognition system based on convolution neural network using multiple distance face. Soft Comput 21(17):4995–5002
Article Google Scholar
Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: Analysis and an algorithm. In: advances in neural information processing systems 14 [Neural Information Processing Systems: Natural and Synthetic, NIPS 2001, December 3–8, 2001, Vancouver, British Columbia, Canada], pp 849–856
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Rafique MA, Pedrycz W, Jeon M (2017) Vehicle license plate detection using region-based convolutional neural networks. Soft Comput
Sanja F, Gregor B, Ales L (2006) Hierarchical statistical learning of generic parts of object structure. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR 2006), 17–22 June 2006, New York, NY, USA, pp 182–189
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556
Singh Saurabh, Gupta Abhinav, Efros Alexei A (2012) Unsupervised discovery of mid-level discriminative patches. In: Computer Vision-ECCV 2012-12th european conference on computer vision, Florence, Italy, October 7–13, 2012, Proceedings, Part II, pages 73–86
Spielman DA, Teng S-H (1996) Spectral partitioning works: Planar graphs and finite element meshes. In: 37th annual symposium on foundations of computer science, FOCS ’96, Burlington, Vermont, USA, 14–16 October, 1996, pp 96–105
Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: IEEE conference on computer vision and pattern recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015, pp 842–850
Yang L, Luo P, Loy CC, Tang X (2015) A large-scale car dataset for fine-grained categorization and verification. In: IEEE conference on computer vision and pattern recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015, pp 3973–3981
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: Computer vision - ECCV 2014-13th European conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I, pp 818–833
Zhao D, Chen Y, Lv L (2017) Deep reinforcement learning with visual attention for vehicle classification. IEEE Trans Cognit Dev Syst 9(4):356–367
Article Google Scholar
Zhao X, Zhang Q, Zhao D, Pange Z (2018) Overview of image segmentation and its application on free space detection. In: 2018 IEEE 7th data driven control and learning systems conference

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China (NSFC) under Grants No. 61273136, No. 61573353 and No. 61533017.

Author information

Authors and Affiliations

State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Le Lv, Dongbin Zhao & Kun Shao

Authors

Le Lv
View author publications
You can also search for this author in PubMed Google Scholar
Dongbin Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Kun Shao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dongbin Zhao.

Ethics declarations

Conflict of interest

Authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lv, L., Zhao, D. & Shao, K. Deep sparse representation-based mid-level visual elements discovery in fine-grained classification. Soft Comput 23, 8711–8722 (2019). https://doi.org/10.1007/s00500-018-3468-3

Download citation

Published: 22 August 2018
Issue Date: 01 September 2019
DOI: https://doi.org/10.1007/s00500-018-3468-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep sparse representation-based mid-level visual elements discovery in fine-grained classification

Abstract

Access this article

Similar content being viewed by others

Mining Mid-level Visual Patterns with Deep CNN Activations

Multi-scale Discriminative Patches for Fined-Grained Visual Categorization

Unsupervised Visual Representation Learning by Graph-Based Consistent Constraints

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep sparse representation-based mid-level visual elements discovery in fine-grained classification

Abstract

Access this article

Similar content being viewed by others

Mining Mid-level Visual Patterns with Deep CNN Activations

Multi-scale Discriminative Patches for Fined-Grained Visual Categorization

Unsupervised Visual Representation Learning by Graph-Based Consistent Constraints

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation