Bag-of-Visual-Words codebook generation using deep features for effective classification of imbalanced multi-class image datasets

Saini, Manisha; Susan, Seba

doi:10.1007/s11042-021-10612-w

Bag-of-Visual-Words codebook generation using deep features for effective classification of imbalanced multi-class image datasets

Published: 10 March 2021

Volume 80, pages 20821–20847, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

637 Accesses
14 Citations
Explore all metrics

Abstract

Classification of imbalanced multi-class image datasets is a challenging problem in computer vision. Most of the real-world datasets are imbalanced in nature because of the uneven distribution of the samples in each class. The problem with an imbalanced dataset is that the minority class having a smaller number of instance samples is left undetected. Most of the traditional machine learning algorithms can detect the majority class efficiently but lag behind in the efficient detection of the minority class, which ultimately degrades the overall performance of the classification model. In this paper, we have proposed a novel combination of visual codebook generation using deep features with the non-linear Chi² SVM classifier to tackle the imbalance problem that arises while dealing with multi-class image datasets. The low-level deep features are first extracted by transfer learning using the ResNet-50 pre-trained network, and clustered using k-means. The center of each cluster is a visual word in the codebook. Each image is then translated into a set of features called the Bag-of-Visual-Words (BOVW) derived from the histogram of visual words in the vocabulary. The non-linear Chi² SVM classifier is found most optimal for classifying the ensuing features, as proved by a detailed empirical analysis. Hence with the right combination of learning tools, we are able to tackle classification of multi-class imbalanced image datasets in an effective manner. This is proved from the higher scores of accuracy, F1-score and AUC metrics in our experiments on two challenging multi-class datasets: Graz-02 and TF-Flowers, as compared to the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparison of Deep Learning, Data Augmentation and Bag of-Visual-Words for Classification of Imbalanced Image Datasets

Large scale classifiers for visual classification tasks

Article 13 June 2014

Thanh-Nghi Doan, Thanh-Nghi Do & François Poulet

Constructing Hierarchical Visual Tree for Discriminative Image Representation and Classification

References

Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, ... & Kudlur M (2016). Tensorflow: A system for large-scale machine learning. In 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16) (pp. 265–283).
Bellet A, Habrard A, Sebban M (2013) A survey on metric learning for feature vectors and structured data. arXiv preprint arXiv:1306.6709
Bellet A, Habrard A, Sebban M (2015) Metric learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 9(1):1–151
Article Google Scholar
Bosch A, Zisserman A, Munoz X (2007, October) Image classification using random forests and ferns. In 2007 IEEE 11th international conference on computer vision (pp. 1-8). IEEE.
Brendel W, Bethge M (2019) Approximating cnns with bag-of-local-features models works surprisingly well on imagenet. arXiv preprint arXiv:1904.00760
Buda M, Maki A, Mazurowski MA (2018) A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw 106:249–259
Article Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article Google Scholar
Cheng G, Li Z, Yao X, Guo L, Wei Z (2017) Remote sensing image scene classification using bag of convolutional features. IEEE Geosci Remote Sens Lett 14(10):1735–1739
Article Google Scholar
Convolutional Neural Networks (CNNs / ConvNets) (2019) The Stanford CS class notes, Spring 2019 Assignments, http://cs231n.github.io/convolutional-networks/, Accessed 28 August 2020.
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
MATH Google Scholar
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248-255). IEEE
Deselaers, T., Pimenidis, L., & Ney, H. (2008, December). Bag-of-visual-words models for adult image classification and filtering. In 2008 19th International Conference on Pattern Recognition (pp. 1-4). IEEE.
Dittman DJ, Khoshgoftaar TM, Wald R, Napolitano A (2014, May). Comparison of data sampling approaches for imbalanced bioinformatics data. In The twenty-seventh international FLAIRS conference.
Eitrich T, Lang B (2006) Efficient optimization of support vector machine learning parameters for unbalanced datasets. J Comput Appl Math 196(2):425–436
Estabrooks A, Jo T, Japkowicz N (2004) A multiple resampling method for learning from imbalanced data sets. Comput Intell 20(1):18–36
Article MathSciNet Google Scholar
Feng J, Liu Y, Wu L (2017) Bag of visual words model with deep spatial features for geographical scene classification. Computational intelligence and neuroscience 2017:1–14
Google Scholar
Georgescu MI, Ionescu RT, Popescu M (2019) Local learning with deep and handcrafted features for facial expression recognition. IEEE Access 7:64827–64836
Article Google Scholar
Géron A (2019) Hands-on machine learning with Scikit-learn, Keras, and TensorFlow: concepts, tools, and techniques to build intelligent systems. O'Reilly Media
Google Scholar
Haralick RM, Shapiro LG (1985) Image segmentation techniques. Computer vision, graphics, and image processing 29(1):100–132
Article Google Scholar
He H, Bai Y, Garcia EA Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence) pp 1322–1328 IEEE
He K, Zhang X, Ren S, Sun J (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
Hou Q, Cheng MM, Hu X, Borji A, Tu Z, Torr PH (2017) Deeply supervised salient object detection with short connections. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp 3203–3212
Kotsiantis SB, Pintelas PE (2003) Mixture of expert agents for handling imbalanced data sets. Annals of Mathematics, Computing & Teleinformatics 1(1):46–55
Google Scholar
Kumar MD, Babaie M, Zhu S, Kalra S, Tizhoosh HR (2017) A comparative study of CNN, BoVW and LBP for classification of histopathological images. In 2017 IEEE Symposium Series on Computational Intelligence (SSCI) pp 1–7. IEEE
Lessmann S (2004) Solving imbalanced classification problems with support vector machines. In IC-AI 4:214–220
Li P, Samorodnitsk G, Hopcroft J (2013) Sign cauchy projections and chi-square kernel. In Advances in Neural Information Processing Systems pp 2571–2579
López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141
Article Google Scholar
Mahmood A, Bennamoun M, An S, Sohel F (2017) Resfeats: residual network based features for image classification. In 2017 IEEE international conference on image processing (ICIP) pp 1597–1601 IEEE
Okafor E, Pawara P, Karaaba F, Surinta O, Codreanu V, Schomaker L, Wiering M (2016, December). Comparative study between deep learning and bag of visual words for wild-animal recognition. In 2016 IEEE Symposium Series on Computational Intelligence (SSCI) (pp. 1-8). IEEE.
Opelt A, Fussenegger M, Pinz A, Auer P (2004) Weak hypotheses and boosting for generic object detection and recognition. In European conference on computer vision Springer, Berlin, Heidelberg pp. 71–84
Oskouei RJ, Bigham BS (2017) Over-sampling via under-sampling in strongly imbalanced data. International Journal of Advanced Intelligence Paradigms 9(1):58–66
Article Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Vanderplas J (2011) Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12:2825–2830
Powers DM (2011) Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation
Google Scholar
Provost F (2000) Machine learning from imbalanced data sets 101. In Proceedings of the AAAI’2000 workshop on imbalanced data sets 68(2000):1–3 AAAI press
Rahimi A, Recht B (2008) Random features for large-scale kernel machines. In Advances in neural information processing systems pp. 1177–1184
Sáez JA, Krawczyk B, Woźniak M (2016) Analyzing the over-sampling of different classes and types of examples in multi-class imbalanced datasets. Pattern Recogn 57:164–178
Article Google Scholar
Saini M, Susan S (2018) Comparison of deep learning, data augmentation and bag of-visual-words for classification of imbalanced image datasets. In International Conference on Recent Trends in Image Processing and Pattern Recognition Springer, Singapore pp. 561–571
Saini M, Susan S (2019) Data augmentation of minority class with transfer learning for classification of imbalanced breast Cancer dataset using inception-V3. In Iberian Conference on Pattern Recognition and Image Analysis Springer, Cham pp. 409–420
Saini M, Susan S (2020) Deep transfer with minority data augmentation for imbalanced breast cancer dataset. Appl Soft Comput 97:106759
Article Google Scholar
Sculley D (2010) Web-scale k-means clustering. In Proceedings of the 19th international conference on World wide web pp. 1177–1178
Shin HC, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Yao J, Mollura D, Summers RM (2016) Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging 35(5):1285–1298
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Suh HK, Hofstee JW, IJsselmuiden J, van Henten EJ (2018) Sugar beet and volunteer potato classification using Bag-of-Visual-Words model, scale-invariant feature transform, or speeded up robust feature descriptors and crop row information. Biosyst Eng 166:210–226
Article Google Scholar
Susan S, Kumar A (2018, December). Hybrid of intelligent minority over-sampling and PSO-based intelligent majority under-sampling for learning from imbalanced datasets. In International Conference on Intelligent Systems Design and Applications (pp. 760-769). Springer, Cham.
Susan S, Kumar A (2019) SSOMaj-SMOTE-SSOMin: three-step intelligent pruning of majority and minority samples for learning from imbalanced datasets. Appl Soft Comput 78:141–149
Article Google Scholar
Susan S, Jain A, Sharma A, Verma S, Jain S (2015) Fuzzy match index for scale-invariant feature transform (SIFT) features with application to face recognition with weak supervision. IET Image Process 9(11):951–958
Article Google Scholar
Susan S, Sethi D, Arora K CW-CAE: pulmonary nodule detection from imbalanced dataset using class-weighted convolutional autoencoder. In International Conference on Innovative Computing and Communications (pp. 825-833). Springer. Singapore.
Syarif, I., Prugel-Bennett, A., & Wills, G. (2012, April). Unsupervised clustering approach for network anomaly detection. In International conference on networked digital technologies (pp. 135-145). Springer, Berlin, Heidelberg.
Tahir MA, Kittler J, Yan F (2012) Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recogn 45(10):3738–3750
Article Google Scholar
Tajbakhsh N, Shin JY, Gurudu SR, Hurst RT, Kendall CB, Gotway MB, Liang J (2016) Convolutional neural networks for medical image analysis: full training or fine tuning? IEEE Trans Med Imaging 35(5):1299–1312
Article Google Scholar
Tang Y (2013) Deep learning using linear support vector machines. arXiv preprint arXiv:1306.0239
Tax DM, Duin RP (2000) Feature scaling in support vector data descriptions. Learning from Imbalanced Datasets, 25–30
The TensorFlow Team (2019) January. Flowers, TensorFlow Datasets http://download.tensorflow.org/example_images/flower_photos.tgz
Google Scholar
Voulodimos A, Doulamis N, Doulamis A, Protopapadakis E (2018) Deep learning for computer vision: a brief review. Computational intelligence and neuroscience 2018:1–13
Google Scholar
Wang XD, Chen RC, Yan F, Zeng ZQ, Hong CQ (2019) Fast adaptive K-means subspace clustering for high-dimensional data. IEEE Access 7:42639–42651
Article Google Scholar
Wang X, Zheng Z, He Y, Yan F, Zeng Z, Yang Y (2020) Progressive local filter pruning for image retrieval acceleration. arXiv preprint arXiv:2001.08878
Xia X, Xu C, Nan B (2017) Inception-v3 for flower classification. In 2017 2nd International Conference on Image, Vision and Computing (ICIVC) pp. 783–787 IEEE
Yang H, Shao L, Zheng F, Wang L, Song Z (2011) Recent advances and trends in visual tracking: a review. Neurocomputing 74(18):3823–3831
Article Google Scholar

Download references

Author information

Authors and Affiliations

Delhi Technological University, New Delhi, India
Manisha Saini & Seba Susan

Authors

Manisha Saini
View author publications
You can also search for this author in PubMed Google Scholar
Seba Susan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manisha Saini.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saini, M., Susan, S. Bag-of-Visual-Words codebook generation using deep features for effective classification of imbalanced multi-class image datasets. Multimed Tools Appl 80, 20821–20847 (2021). https://doi.org/10.1007/s11042-021-10612-w

Download citation

Received: 20 May 2020
Revised: 14 November 2020
Accepted: 25 January 2021
Published: 10 March 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s11042-021-10612-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bag-of-Visual-Words codebook generation using deep features for effective classification of imbalanced multi-class image datasets

Abstract

Access this article

Similar content being viewed by others

Comparison of Deep Learning, Data Augmentation and Bag of-Visual-Words for Classification of Imbalanced Image Datasets

Large scale classifiers for visual classification tasks

Constructing Hierarchical Visual Tree for Discriminative Image Representation and Classification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bag-of-Visual-Words codebook generation using deep features for effective classification of imbalanced multi-class image datasets

Abstract

Access this article

Similar content being viewed by others

Comparison of Deep Learning, Data Augmentation and Bag of-Visual-Words for Classification of Imbalanced Image Datasets

Large scale classifiers for visual classification tasks

Constructing Hierarchical Visual Tree for Discriminative Image Representation and Classification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation